What is Deepseek-V3.1 and why is everyone talking about it?

Chinese Ai Beginning Depth tie Deepseek-v3.1the latest model of flagship. Builds in building the construction of Deepseek-V3Add important enhancements for consultation, tool use, and code use. Obviously, deep models quickly receive a reputation Moving OPENAI level and anthropic performance in the cost of the cost.
Model Architecture and Skills
- Hybrid thought mode: Deepseek-v3.1 supports both speculate (Thinking, deliberate consideration) and not thinking (Direct, spread-sensitive) generation, changing about the conversation template. This is from previous versions and gives a variable of various use cases.
- Tool and Agent Support: The model is designed Tool to drive including Agent's activities (eg using the APIs, the execution of the code, search). Tool Calls use a systematic format, and model supports the agents of Custom Code and Search agents, with detailed templates provided at the warehouse.
- Great quote, effective performance: The model is proud 671B Parametersby 37b activated with each token– Mixture-expert (moe) Design costs reduces cost while maintaining capacity. This page core Is 128K tokensIt's more greater than many competitors.
- A long extension: Deepseek-V3.1 is using a The two-phase extension the way. The first paragraph (32k) was trained 630B tokens (10x on top of v3), and second (128k) on 209B tokens (3.3x on top of v3). The model is trained Fp8 microsscaling For effective statistics to the following hardware.
- Template of the chat: The template supports Many conversations turn For clear tokens for the promotion of the program, user questions, and help answers. This page speculate including not thinking Methods are caused by
includingtokens in quick order.

The benches of work
Deepseek-v3.1 checked across different benches (See the bottom table), including standard information, codes, statistics, tools, and agent activities. Here is bright ideas:
| Metric | V3.1-Nonnithinking | V3.1 – Thinking | Fortifications |
|---|---|---|---|
| MMLU-REDUX (EM) | 91.8 | 9.7 | 93.4 (R1-0528) |
| MMLU-Pro (EM) | 8.7 | 84.8 | 85.0 (R1-0528) |
| GPQA-Diamond (PASS @ 1) | 74.9 | 80.1 | 81.0 (R1-0528) |
| LiveCodebelch (PASS @ 1) | 56.4 | 74.8 | 73.3 (R1-0528) |
| Aimé 2025 (PASS @ 1) | 49.8 | 88.4 | 87.5 (R1-0528) |
| SWE-Bench (Agent Mode) | 54.5 | – | 30.5 (R1-0528) |
This page Imaginative mode In accordance with it it is compatible or exceeds previous Kingdom versions, especially in codes and stories. This page Untonable mode It's quick but less, making it good for latency sensitive requests.


The tool and combination of the code agent
- Tool to drive: The programmed use of tool is supported in unquestion mode, which allows information on an external API information and services.
- Code Codes: Developers can create customer codes by following the implants provided by trajectory, detailed intertocol information for the code, execution, and error. Deepseek-V3.1 can use the external information tools, an important business feature, financial factor, and technical research programs.
Submission
- Open Source, MIT License: All Model and Code metals It is freely available Confession and Modelscope under the Licenseto promote research for both commercials.
- Diagnosis: The model structure is compatible with Deepseek-V3, and detailed regulatory instructions provided. Running requires important GPU services due to model rate, but the open ecosystem and community tools are low obstacles.
Summary
Deepseek-v3.1 represents a milestone to develop a democratically developed AI, indicating that source open, which is very expensive, the most skilled models of language. Its combination of United Networking, Integrationbeside unique operation In coding and statistics they place it as a valid useful decision of research and used AI development.
Look The model in the kisses of face. Feel free to look our GITHUB page for tutorials, codes and letters of writing. Also, feel free to follow it Sane and don't forget to join ours 100K + ml subreddit Then sign up for Our newspaper.

Michal Sutter is a Master of Science for Science in Data Science from the University of Padova. On the basis of a solid mathematical, machine-study, and data engineering, Excerels in transforming complex information from effective access.



