The AI Model Confidence Trap

nimda May 26, 2026

0 7 5 minutes read

a bit of humor on Saturday and decided to ask ChatGPT a simple question: “Who won the Nobel prize in Physics in 2025?“

ChatGPT quickly responded: “The 2025 Nobel Prize in Physics has been awarded to…” It even gave the names, labs, and description of some of the research that won them a Nobel Prize!

There was only one problem—a very small one. The Nobel Prize they had not yet arrived announced. However, this model did not hesitate; it did not pause; certainly did not say, “I don't have enough information” or, even better, “the winner of the 2025 Nobel Prize has not yet been announced!“

Instead, she confidently walked into the room, sat down, and delivered the tales with the vigor of a PhD candidate. As someone who has defended a PhD, I wish I had the confidence of ChatGPT when it does things!

As humans, we tend to do something interesting with confidence; we associate it with accuracy, but that's not always the case. If someone says, “I think the answer might be 42” and the other person says, “The answer is completely 42,” most of us emotionally trust the second person a lot, even if both of them might be wrong. For us, confidence sometimes serves as a useful signal of accuracy. However, in AI systems, confidence can be an incredibly unreliable interlocutor.

In this article, we will explore why.

Confidence Feels Like It's Possible

Suppose we ask LLM to predict which animal is in a given image. It says:

Cat: 0.97
Dog: 0.02
Bird: 0.01

Many will interpret that as: ”The model is 97% sure that this is a cat.”

That's a reasonable idea. Unfortunately, that's often not what those numbers mean. We need to remember that most AI models use a function called Softmax to generate predictions.

The Softmax function transforms the raw outputs (called logs) into values that sum to one and are similar to probabilities. The important thing to note here is the exponential term, which can cause a small difference to become very large.

Author's photo

So, the model doesn't say, “I have absolute proof that this is a cat.” It might simply be said: “Of these choices, the cat won with the smallest jeans.” Those are very different statements with completely different meanings.

Humans and AI Handle Uncertainty Differently

Although they can be uncomfortable to live with, humans are incredibly good at expressing and dealing with uncertainty.

We often hear that: “I could be wrong…“,”I'm sure…”,”It might be…”, or “I think…“. Our confidence tends to exist on a spectrum. AI systems, however, tend to behave like that one person in a group project who confidently explains something they learned three minutes ago (I'm sure we've all had that student in class too…).

So, when we talk to an LLM, both telling it “I think Paris is the capital of France,” and it answering “Paris is the capital of France with 99.8% probability,” gives the same power as telling it “I think Atlantis is a myth,” and answering “Atlantis is located about 400 miles west of Portugal with 98.7% confidence.”

Although these two conditions have very different outcomes, LLM treats them equally.

The Confident Fool's Problem

This creates what I think of as the confident fool's problem. Where a system can be incredibly wrong while sounding incredibly accurate. And unfortunately, confidence tends to rise when we'd like to be more cautious.

This is especially evident when LLMs encounter situations outside of their training distribution.

Suppose we are training an image classifier to identify cats and dogs. But then we decided to give it a toaster image! Ideally, the model should say, “I have absolutely no idea what this is.” How would most people react to being shown something they have never seen before? Instead of saying that, the model might respond:

Dog: 98%
Cat: 2%

Now, unless your toaster is shaped like a poodle, that answer is patently false!

Why is this happening? The answer is simpler than most people think. Simply put, it happens because the model has not been trained to: “None of the above.” Therefore, when it encounters something unusual, it chooses the highest score available among the options.

It is like forcing someone to answer “What is this fruit?” while pointing to the bike.” Finally, they will pick a fruit just to solve the situation and say, “A banana?“

Let's simulate an overconfident model.

If a model reports “90% confidence”, we can trust that it is correct about 90% of the time. Instead, most systems look like “90% confidence, 65% accuracy”. This gap between confidence and accuracy is why the way we choose to train these LLMs is so important.

Teaching Models for More Trust

OK, we know why models tend to be wrong about their confidence, but how can we overcome that to have better models that are accurate, or as accurate as their confidence? This is where the measurement begins.

Measurement does not improve predictions. Instead, it improves credibility! So, if a model says 90% after calibration, it should say: “Historically, predictions of this confidence level have been correct about 90% of the time.”

Methods such as:

Platt Scaling
Measuring temperature
Isotonic Regression

try to reconcile predicted confidence with observed results.

Let's see how this looks:

Why This Matters

It's easy to laugh when an AI thinks a toaster is a dog. Because that is, without a doubt, very funny. However, there are many funny situations. Not just mildly funny, but serious, and possibly life-threatening. Applying LLMs to medical diagnostic systems, autonomous vehicles, fraud detection, and financial forecasting requires high accuracy.

When the model tells the doctor: “Chance of cancer: 99%” or “Chance of cancer: 62%,” the doctor's response varies markedly!

If confidence scores are not well measured, people may trust predictions that are not worthy of trust. And people are at great risk here because self-confidence feels influential. Even when we know better.

As models move on from real-world performance, we may have to stop asking: “How accurate is the model?” and start asking: “If the model says 90%, does it really mean 90%?” Because there is a difference between a smart model and a reliable model.

People aren't perfect with uncertainty, either. We are overconfident all the time. We think we can finish the job in two days. We think we can assemble furniture without reading instructions. We think we only need one trip from the car to deliver the groceries. Even when history suggests otherwise.

Maybe AI is simply inheriting some of our bad habits? The difference is that when people make a mistake of confidence, usually few people suffer. If the AI is wrong about confidence, the error can be in the millions, and confidence at scale is a very different problem.

Final thoughts

For years, we've measured AI progress by asking increasingly intriguing questions:

Can it write code? Can it produce creativity? Can it pass the tests? Can it show?

Those questions are useful, but sometimes they can distract us from something more important:

Can we hope?

A model that produces the correct answer once is interesting. A model that repeatedly produces the correct answer while knowing when it might be wrong is something entirely different. Honesty rarely creates flashy headlines.

Self-confidence is not a problem. The problem begins when confidence becomes a function rather than a reasonable measure of certainty. As AI systems continue to enter healthcare, education, finance, research, and decision-making pipelines, we may need to stop treating confidence scores as measures of truth and start treating them as measurements that require validation.

Because a sound model is simple, knowing when a model is not reliable can be one of the most difficult problems we have yet to solve.

Source link

nimda May 26, 2026

0 7 5 minutes read