Trained on Tokens, Scaled on Concepts: The Emergence of Semantic Scale in LLMs

Large-scale Linguistic Models (LLMs) often lack reasonable confidence ratings for their results. Although basic LLMs are known to exhibit subsequent token ratings, it remains unclear whether they can assess confidence in the true meaning of their responses beyond the token level. We find that, when using a sample-based view of semantic estimation, basic LLMs are remarkably well-calibrated: they can reasonably assess confidence in open-domain question-answering tasks, despite not being clearly trained to do so. Our main theoretical contribution establishes a mechanism for why semantic approximation emerges as a consequence of the prediction of the next token, which develops the latter connection between approximation and spatial loss validity. The theory rests on a general definition of “B-equation,” which is a notion of equivalence that is parameterized by choosing equivalence classes (semantic or otherwise). This theoretical approach leads to a testable prediction: basic LLMs will be statistically significant when they can easily predict their distribution in semantic response classes before generating a response. We state three implications of this prediction, which we confirm experimentally: (1) Basic LLMs are logically balanced in all question-answering tasks, (2) ordering RL instructions systematically violates this balance, and (3) chain of reasoning violates balance. To our knowledge, our work provides the first systematic explanation of when and why semantic evaluation occurs in LLMs.



