Dynamic Thinking: Large Language Models Know Time to Think in Subtle Space

Recent advances in large-scale linguistic models (LLMs) during computer-based assessment have introduced the ability to perform intermediate chain-of-thought (CoT) reasoning before generating answers. Although increasing the inference budget shows a smooth performance improvement during forecasting, the relationship between LLM power, query complexity, and optimal budget allocation remains poorly understood to achieve optimal inference. To address this challenge, we use stability, the agreement between multiple modes of reasoning, as a proxy for the need for reasoning. We begin by identifying that low stability indicates when questions require extended thinking to arrive at correct answers. Building on this understanding, we present Sonata (An Accountability-Oriented Adapter for Thought Allocation), a lightweight approach that dynamically allocates thought budgets to optimize tradeoff efficiency. Sonata integrates an adapter trained offline on a benchmark dataset to predict its adaptation directly from the last hidden layers and hidden representations during the query completion phase. This forecast then guides the budget allocation that goes all the way ahead of time. The adapter is generic, transferable to all different tasks once trained, and introduces almost zero computational overhead during inference. Notably, Sonata is orthogonal to existing CoT compression methods, allowing additional efficiency gains when managing the logic budget across queries. Extensive testing on multiple models (Qwen3-8B, GPT-OSS-120B, Qwen3-235B-A22B, Intern-S1-mini) and benchmarks (AIME24, AIME25, GSM8K, MATH500, GPQA) shows that the Sonata achieves a 20% reduction to 80% in the same way. to 5% accuracy improvement for the same token cost.
† Work done while at Apple
‡ University of North Carolina at Chapel Hill



