Deepseek investigators introduce Deepseek-V3.2 and Deepseek-V3.2-special for long context and agentic loads

How to get GPT-5-Level resistance in a real context, a tool – using workloads without paying the quadratic and GPU attention costs that usually make those programs possible. Deep research presents Deepseek-V3.2 and Deepseek-V3.2 -2-1. They are consulting-first models designed for agents and aim for high-quality rendering, contextualization and agent workflow, with open tools and production apis. Models include deep sparse analysis (DSA), GRPO's advanced learning stack and agent-based protocol, and report performance comparable to GPT 5, with Deepseek-v3.2-special Access to Gemini 3.0 Pro features Community Benchmarking and Competitions.

Multiple attention with close linear and long context costs
Both Deepseek-V3.2 and Deepseek-V3.2 -2-1 Use the Deepseek-V3 Transformer hybrid with 671b total parameters and 37b active parameters per token, inheriting from the V3.1 terminus. The only systematic change of intensive attention that appears strongly, is introduced through the continuation of previous training.
Deep, deep attention breaks through attention to it 2 Elements. A lightning indicator It uses a small number of heads with low resolution over all Token masters and produces compatibility scores. A a good paid picker It stores the highest KK-key value positions for each question, and the main attention mechanism works – lots of attention and high attention of the head in this organized set.
This changes the complexity from O (l²) to O (kl), where l is the sequence and k is the number of selected tokens and is much smaller than L. Based on the benchmarks, Deepseek-V3.2 It aligns the dense base terminus with precision while reducing the long deviation cost below by 50 percent, with fast feed and low memory consumption of H800 Class Hardwang.


It continues the pre-training sparseek intensive training
Deep, deep attention (DSA) was introduced with a pre-training course over Deepseek-V3.2 Terminus. In the dense warm-up phase, dense attention remains active, all Backbone parameters are frozen and only the Lightning Index is trained with the loss of killback leibler to match the attention distribution with the size of the 128k attention sequence. This section uses a small number of steps and as for the 2B tokens, they are enough for Indeoner to learn useful scores.
In the Sparse stage, the Selector stores 2048 value entries per query, the backbone is not qualified and the model continues to be trained with 944b tokens. Gradients for theoner still only come to the loss of alignment with healthy attention to selected positions. This program does Deep, deep attention (DSA) Treat them as a replacement for acquisitions of the same quality and long term low quality and low cost context.


GRPO with more than 10 percent RL combined
On top of sparse structures, Deepseek-V3.2 Use Group Optimization Optimization (GRPO) as a large method of reinforcement learning. Research group means that post training Emphasis on Reading RL compute exceeds 10 percent of pre-training.
RL is organized around specialist domains. The research group trains the rusts dedicated to statistics, programming training, general logical reasoning, browsing and agent functions and security, and reduces these experts to the 685B parameter base for Deepseek-V3.2 and Deepseek-V3.2 -2-1. GRPO is performed with the KL Estimator Unreversed, off policy sequence and conservative methods A mix of professionals (Moe) route and sampling mask that changes between training and sampling.


Agent data, thinking mode and tool protocol
Deepseek's deep research team created a large synthetic dataset by generating more than 1,800 locations and more than 85,000 functions across code agents, search agents, standard tools and code translation tools. Tasks are designed to be difficult to solve and easy to verify, and are used as targets for RL and real coding and search colors.
At the time of seeing, Deepseek-V3.2 introduce clear thinking and non-thinking methods. Deentiseek-Tiseer Endpoint presents a mode of automatic thinking, where the model generates an internal chain of thought before the final answer. Thinking about the Tool Guide explains how the content of thinking stored in your device is called and cleared when a new user message arrives, and how to call the tool the tool when there is a determined situation or if there is a reason to consult us if there is a reason to consult or if there is a reason to consult or if there is a reason to consult us if there is a reason to consult or if there is a reason to consult us if there is a reason to consult or if there is a reason to consult us if there is a reason to consult or if there is a reason to consult whether there is a reason to consult us when there is a reason to consult whether there is a reason to consult us when there is a reason to consult whether there is a reason to consult us when there is a reason to consult or there is a reason to consult is limited whether the results of the consultation remain or whether Phores Pricing is planned by the budget.
The dialog template is updated for this behavior. This page Deepseek-V3.2 It ships with the Python Encoder and decoder helpers instead of the Jinja template. Messages can carry a rating_content field alongside the content, which is controlled by the logic parameter. The developer role is reserved for search agents and is not accepted in General Chat Flows by the official API, which protects this channel from accidental misuse.


Benches, competitions and art openings
In the standard code display Benchmarkr, Deepseek-V3.2 and especially Deepseek-V3.2 Chief It is reported to be comparable to GPT-5 and close to Gemini-3.0 Pro in Suimes such as AID 2025, HMMT 2025, GPQA and LiveCodeberch, with improved cost-effectiveness for long-term work.
For organized competitions, the intensive research group says Deepseek-V3.2 Chief Achieves GOLD Medical level performance in International Mathematics Olympiad 2025, Chinese Mathematics Olympiad 2025 and International Informatics Olympiad 2025 ICPC Finals 2025 type level performance.
Key acquisition
- Deepseek-V3.2 adds intensive sparseek analysis, which brings close to the O (KL) cost around 50% lower API cost compared to the intensive returned derived models, while maintaining the same quality as Deepseek-V3.1 Terminus.
- The prototype family stores the 671B Parameter Moe Backbone with 37b working parameters with a full 128K window in production documents, many chains are followed by the main tool only.
- Post training uses Group Actimization Optimic (GRPO) with a negative budget of more than 10 percent of the previous training, which focuses on job demonstrations, browsing or commercial experts, and competition-style experts facing external validation cases.
- Deepseek-V3.2 is the first model in the deepseek family to integrate thinking directly into the use of the tool, supporting both thinking and non-thinking methods when considering new messages.
Look Paper and Model weight. Feel free to take a look at ours GitHub page for tutorials, code and notebooks. Also, feel free to follow us Kind of stubborn and don't forget to join ours 100K + ML Subreddit and sign up Our newsletter. Wait! Do you telegraph? Now you can join us by telegraph.
AsifAzzaq is the CEO of MarktechPost Media Inc.. as a visionary entrepreneur and developer, Asifi is committed to harnessing the power of social intelligence for good. His latest effort is the launch of a media intelligence platform, MarktechPpost, which stands out for its deep understanding of machine learning and deep learning stories that are technically sound and easily understood by a wide audience. The platform sticks to more than two million monthly views, which shows its popularity among the audience.
Follow Marktechpost: Add us as a favorite source on Google.



