LLM Gamble | About Data Science

0 2 6 minutes read

with an LLM, and you have a question in mind, there is an undeniable sense of possibility. You can't be sure what the answer will be, but there is a good chance that it will impress you with its confidence and clarity in your request, and that it will solve your problem in seconds. If it does, the experience can be very pleasant!

However, sometimes it fails — either for general purpose information or in specific cases such as coding. As Alberta Tech's TikTok account shows, sometimes AI creates its own tasks and assumptions, creating something for you that otherwise wouldn't work. But, sometimes, it gives you something that works! A lot about this sounds like a slot machine, doesn't it?

You don't know what will happen when you press the button, but you hope for a happy result, and every time you have a new chance for that dopamine hit. Nondeterminism makes every answer a little different, and not knowing what you're going to get can be really exciting! It's like your social media feed, and – what's next? It could be an ad, or it could be your favorite creator.

Obviously I'm nowhere near the first person to see this part of the experience of using productivity AI. In the fall of 2025, Cory Doctorow made the point that we remember the times when gen AI worked better than we remember the times when it failed and we had to push the button again, like gamblers. Wesam Mikhail posted on LinkedIn about how the “winners” are misleading because the working code also introduces bugs and tech bills under the hood. But we hear the rush of “oh, wow, look, it did it!” anyway. Paul Weimer, Fang-Pen Lin, and many others have written about this situation in the past few months.

One of the things that several of them also talk about is the financial implications, and that's a big part of what interests me about the metaphor.

Chips

We pay for AI production in units called tokens. These words or parts of words, in general, form the units of measurement of inputs and outputs from LLMs. In a practical sense, the number of tokens is a measure of how much energy is used during the decision process. By paying for tokens, we pay for all the resources and overhead involved in indexing. This is why we end up paying for both the amount of text we send to LLM, in the form of information, and the amount of text that LLM returns to us in its responses.

The cost of using LLMs, therefore, is presented in dollars per token, such as 5 dollars per million input tokens and 25 dollars per million output tokens, which are the current API levels of Anthropic for Opus 4.6. There are also detailed prices for cache hits and duplicates, but this is the basic level. For OpenAI, the prices are lower but they are scaled in the same way: for GPT 5.4, it is $2.50 for 1 million input tokens and $15 for 1 million output tokens. Older and lower quality models are usually cheaper.

So, if you submit 1 million input tokens to Opus 4.6, that will cost you $5, and if the output from Opus reaches 1 million tokens in length, that will cost you $25, making your total cost $30. 1 million tokens sounds like a lot, too (1.5 million tokens is about the length of the Harry Potter book series) but over time the cumulative usage if you make LLM part of your regular job can add up quickly.

You may have already seen the first point I want to make – you can obviously control how many tokens you transfer, and thus control your costs, but that control is limited. You can make your information shorter, limit extraneous instructions, and lower your installation costs as a result. However, when agent tools are involved, and the LLM creates commands to pass to other LLMs, you no longer manage the length of the commands. More importantly, you have very little control over the number of tokens that any model responds with (such as asking it to be “brief”). For the most part, the number of output tokens is part of the unknown unknown I described earlier. Also, you will notice, the withdrawal token is worth 5 times the value of the input token.

So, going back to our slot machine metaphor, you put a quarter into the machine, and that pays for your draw. But then you get the answer, and you MUST pay for it, even though you have no advance warning of how much it will cost. If you didn't pass that pull, and LLM built its own coding language and nothing runs? You still have to pay for what you output, and the cost is only predicated on how long the response was, regardless of how useful it was. The length can be any size, especially for agent AI, and you have no way to predict it.

Well, you might think, this is the price of the product, and anyway, the next pull will be better, right? So you pay for what you took out that didn't work, and you put another quarter in the machine, and you pull, and you hope for the best.

Subscriptions

Regular users of generative AI might comment, “oh, but you can get a subscription and just pay a lower price!” This is true, and this is instrumental in the success of the adoption of these tools so far. This cuts out the token level costs, allowing you to use LLM at a lower cost up to the usage limit. Claude subscriptions for each user start at $20 per month, and this tier gives you Claude Code, Cowork, research tools, and extensions for logging into other software like Excel.

However, it is not as transparent as it may seem. None of these plans, from any provider, allow unlimited usage, and the details of the limits are buried deep in the documentation — “Your usage is affected by several factors, including the length and complexity of your conversations, the features you use, and the model of Claude you're chatting with.” This means that you cannot plan in advance how much of your budget you will spend in any one situation. Best of all, you have a limit on the expenses you'll incur in any given month, so no surprise bills will pop up, but you have no real idea when your monthly usage will suddenly be cut off.

Put another way, if your usage budget is based on features, what model you're using, and other defining factors, then your token usage isn't a flat limit. Usage limits are not properly fixed for token numbers. What this means is that many registered users may spend more than $20 on services every month. This is even more true for Max plans, which cost between $100 and $200 per month and offer more usage, but again, usage limits are hidden from users' view. Clarifying what the limits actually are, and what makes your usage take more than your limit, is an issue that users often discuss, for example in Reddit communities or other social networks.

The conclusion

What does this mean, in general? First, the material cost of using a productive AI indicator is very high. For companies like Anthropic and OpenAI to make a big profit, let alone make a profit and meet the expectations of investors, analysts generally agree that the prices I put above are below the cost. That's why, for example, Anthropic forced OpenClaw users to use the price of each token, not the subscription — people used their limits a lot and made the subscriptions the loss leaders.

However, using a pay-per-view is very difficult to sell to many users, because it reveals the fact that I explained earlier, which is that you have to pay for the slot machine's draw and output, even if you don't win. We expect the value of money, in situations like this, so the gambling business model doesn't really make sense in the software context. Once we get used to ROI and quality assurance, a business model where you are required to pay for a product even when it doesn't work requires a significant paradigm shift.

There is no choice, however, for the AI providers who produce it – when the model makes the index and returns the tokens it costs them money, whether the answer is good or not. This is at the heart of the question of how this technology evolves from an innovation to a sustainable business. Will people accept paying for each gamble, when they cannot predict how much it will cost them (because the amount of output tokens is not limited) and they cannot predict whether it will actually work for their needs? I have to doubt it, for many people, and that means a critical time bomb for the industry.

Read more about my work at www.stephaniekirmer.com. Also, you can see me speak live in person at ODSC East on April 30th in Boston.