Why the Strength of Your Prompt Correlates Almost Perfectly with the Complexity of the Response, as Research by Anthropic Found

nimda January 23, 2026

0 2 7 minutes read

Why the Strength of Your Prompt Correlates Almost Perfectly with the Complexity of the Response, as Research by Anthropic Found

the idea is widespread in the AI field that rapid engineering is dead, or at least not working. This is, on the one hand because pure language models are more flexible and robust, better tolerant of ambiguities, and on the other hand because logic models can operate following erroneous commands and thus better understand the user. Whatever the exact reason, the era of “magic phrases” that served as riddles and hyper-specific wording hacks seems to be fading. In that narrow sense, rapid engineering as a bag of tricks (which has been scientifically analyzed in papers like this one by DeepMind, which revealed the rapid seed of language models back when GPT-4 was made available) is truly a form of death.

But Anthropic has now put the numbers behind something more subtle and important. They found that while the exact wording of the prompt is more important than ever, the “sophistication” behind the prompt is more important. In fact, it corresponds almost entirely to the complexity of the model response.

This is not a metaphor or a motivational “slogan”, but a tangible result obtained from the data collected by Anthropic since its implementation. Read on to know more, because this is all very interesting, beyond the implications of how we use LLM-based AI systems.

Anthropic Economic Index: January 2026 Report

Of Anthropic Economic Index: January 2026 Reportlead writers Ruth Appel, Maxim Massenkoff, and Peter McCrory analyze how people actually use Claude in all regions and situations. Starting with what was the most impressive finding, they saw a strong quantitative relationship between the level of education required to understand the user's input and the level of education required to understand Claude's response. Across countries, the correlation coefficient is r = 0.925 (p < 0.001, N = 117). For all US states, it is r = 0.928 (p < 0.001, N = 50).

This means that the more you read, and the more clearly you can enter, the better the answers. In simple words, how fast people are is how Claude responds.

And you know what? I have seen this myself when I compare how I and other PhD level colleagues interact with AI systems compared to how untrained users.

From “quick hacks” to “mental scaffolding”

Early discussions about agile engineering focused on high-level techniques: adding “let's think step by step”, specifying a role (“act like a senior data scientist”), or carefully ordering instructions (more examples of this in the DeepMind paper I linked in the introduction section). These methods were useful when the models were fragile and easily disintegrated – which, by the way, was used to overwrite their safety rules, something that is very difficult to achieve now.

But as models evolve, many of these strategies become optional. The same model can often find a reasonable answer even without them.

Anthropic's findings explain why this ultimately led to the idea that rapid engineering is obsolete. It turns out that the features of the “machine” of information—the syntax, the magic words, the formatting conventions—are less important. What has not disappeared is the importance of what they call “cognitive scaffolding:” how well the user understands the problem, how accurately he writes it, and whether he knows what a good answer looks like—in other words, critical thinking to tell good answers from useless ideas.

Research applies this idea using education as a quantitative proxy for complexity. The researchers estimated the number of years of study required to understand both the information and the answers, and found a close correlation between the two! This suggests that Claude does not independently “enhance” or “decrease” the level of interpersonal intelligence. Instead, it mimics user input remarkably closely. That's great if you know what you're asking, but it makes the AI system ineffective if you don't know much about it yourself or when you're perhaps typing a request or question too quickly and carelessly.

If the user provides a shallow, less specific command, Claude often responds with an equally shallow level. When the information includes in-depth background knowledge, well-thought-out constraints, and clear standards of excellence, Claude responds in kind. And hell yes I've seen this in the ChatGPT and Gemini models, which are the ones I use the most.

Why is this not trivial

At first glance, this may sound obvious. Of course better questions get better answers. But the magnitude of the correlation is what makes the result scientifically interesting. Correlations above 0.9 are not uncommon in social and behavioral data, especially across heterogeneous units such as US states or regions. Therefore, what the work found is not a weak tendency but a very positive relationship.

Obviously, the findings contradict the common view that AI can act as a leveler, by allowing everyone to receive the same level of knowledge regardless of their language, level of education and familiarity with the subject. There is widespread hope that advanced models will “upgrade” low-skilled users by automatically providing professional-level output regardless of input quality. The results obtained by Anthropic suggest that this is not the case at all, and the truth is very conditional. Although Claude (and this probably applies to all conversational AI models out there) is capable of generating very complex responses, he usually only does so when the user provides authoritative information.

The behavior of the model is not fixed; it is designed

Although for me this part of the report has no supporting data and from my personal experience I would be inclined to disagree, it suggests that this effect of “showing off” is not a natural phenomenon for all types of language, and that the way the model reacts depends very much on how it is trained, properly configured, and how it is instructed. Although I say I disagree, I can see that one can imagine a system information that forces the model to always use simple language, regardless of the input to the user, or on the other hand that always responds with high-tech prose. But this will need to be designed.

Claude seems to take the middle ground. Rather than enforcing a fixed register, it adapts its level of complexity to the user's input. This design option increases the value of the user's skill. The model is able to think at an expert level, but treats the information as a signal of how much that skill should be used.

It would be really nice to see other big players like OpenAI and Google using similar types of testing and analyzing their usage data.

AI as an iterative, limited

The “cliché” that “AI is the equalizer” is often repeated without evidence, and as I mentioned above, Anthropic analysis provides exactly that… but poorly.

If productivity measures technology at input complexity, then the model is not a substitute for human technology (and is not equivalent); however, it is repetitive. And this is good for users who use AI system in their technical domains.

A weak foundation multiplied by a powerful tool is always weak, and in the best case scenario you can use AI consulting to get started in the field, as long as you know enough to at least tell realistic ideas from the facts. A strong foundation, in contrast, benefits greatly because you start with more and get more; for example, I often think about ChatGPT or better with Gemini 3 in the AI studio about the equations that explain physics phenomena, so that in the end I can find pieces of code for the system or full applications, to say, to enter data into more complex mathematical models. Yes, I could do that, but by carefully writing my commands into the AI system it can get the job done in literally orders of magnitude less time than I would otherwise have.

All of this framing can help reconcile two seemingly contradictory narratives about AI. On the other hand, models are undeniably amazing and can outperform humans in many small tasks. On the other hand, they tend to disappoint when used carelessly. The difference is not primarily the wording of the information, but the user's understanding of the domain, the problem structure, and the success criteria.

Implications for education and work

Another implication is that investment in staff is still important, and substantial. As models become better mirrors of technology use, differences in expertise may become more apparent rather than less as the “measurement” narrative suggests. Those who can generate accurate, well-founded information will derive much greater value from the same sub-model than those who cannot.

This also redefines what “agile engineering” should mean going forward. It's less about learning a new technical skill and more about cultivating culture: background knowledge, critical thinking, problem decomposition. Knowing what to ask and how to recognize a good answer turns into a real interface. All of this is probably obvious to us readers About Data Sciencebut we're here to learn what Anthropic discovered in a quantitative way that makes it all the more compelling.

Remarkably, to close, the Anthropic data makes its points with extraordinary clarity. And again, we have to call all the big players like OpenAI, Google, Meta, etc. to perform the same analysis on their usage data, and ask them to present the results to the public as Anthropic does.

And just as we have been fighting for a long time for widespread free access to AI chat systems, clear guidelines to suppress misinformation and willful misuse, ways to eliminate or at least flagging, and more, we can now add requests for real equality.