If a chatbot is running your store

You may have heard of people connecting chatbots to controllers that do real things. Administrators may use Internet searches, use commands to open and read documents and spreadsheets, or edit or delete entire databases. Whether this sounds like a good idea depends in part on how bad it is for a chatbot to do something harmful, and how harmful you've allowed it to be.
That's why using a one-stop shop is a great way to test this type of chatbot enabled. Not because the AI is likely to do a great job, but because the damage is contained.
Anthropic recently shared an experiment where they used a chatbot to launch their company store. A human employee still had to stock the shelves, but we put an AI agent (they called Claude) in charge of talking to customers about the products that would be available, and researching the products online. How well did it go? In my opinion, it is not.
Claude:
- He was easily convinced to offer discounts and freebies
- He started placing tungsten cubes on request, and sold them at a huge loss
- Established interviews with absent employees
- He is said to have visited 742 Evergreen Terrace (the fictional address of the Simpsons family)
- He is said to be in the area wearing a navy blue blazer and red tie
That was in June. Sometime later this year Anthropic convinced reporters from the Wall Street Journal to try an updated version of Claude (which they called Claudius) in the indoor store. Their writing is very funny (original here, archived version here).
In short, Claudius:
- He was convinced many times that he should give everything away for free
- Ordered a Playstation 5 (gave it for free)
- Ordered a live betta fish (they gave it away for free)
- He told the employee to leave a stack of cash for them next to the register
- It was very exciting. “Profits have collapsed. The news situation has escalated dramatically.”
(The betta fish are fine, happily housed in a large tank in the media room.)
Why can't chatbots stick to reality? Remember that larger language models do better. They will only follow their original instructions if sticking to those instructions is the most likely next line in the script. Is the script a model customer service storyboard? Science fiction? Both of these situations are in its online training data, and it has no way of telling which is the real world reality. A newsroom full of talented reporters can easily have a Bugs Bunny chatbot change situations. I don't see this problem going away – it's very important to how large language models work.
I would love a Claude or Claudius vending machine, but only because it's weird and fun. And obviously only if someone provides the budget.
Bonus content for AI Weirdness fans: I'm revisiting the Christmas carols dataset using the old-school char-rnn language model. Things get ugly very quickly.



