DeepSeek-AI Releases DeepSeek-R1-Zero and DeepSeek-R1: First-Generation Reasoning Models That Promote Reasoning Skills in LLMs Through Reinforcement Learning

nimda January 21, 2025

0 8 3 minutes read

DeepSeek-AI Releases DeepSeek-R1-Zero and DeepSeek-R1: First-Generation Reasoning Models That Promote Reasoning Skills in LLMs Through Reinforcement Learning

Large-scale Language Models (LLMs) have made significant progress in natural language processing, which are very efficient at tasks such as understanding, processing, and reasoning. However, challenges remain. Achieving robust reasoning often requires extensive supervised processing, which limits scaling and generalization. In addition, problems such as poor readability and measuring computational efficiency and computational complexity persist, prompting researchers to explore new approaches.

DeepSeek-R1: A New Approach to LLM Consulting

The latest work of DeepSeek-AI presents DeepSeek-R1a model designed to improve reasoning ability through reinforcement learning (RL). This effort resulted in two models:

DeepSeek-R1-Zerowhich is trained only through RL and exhibits emergent thinking behaviors such as long Chain-of-Thought (CoT) thinking.
DeepSeek-R1which builds on its predecessor by integrating a multi-stage training pipeline, addressing challenges such as readability and language mixing while maintaining higher order thinking performance.

These models aim to overcome existing limitations, combining new RL techniques with structured training processes to achieve scalability and usability.

Technological Innovation and Benefits

1. Focus Learning on Consulting Activities: DeepSeek-R1-Zero uses RL without relying on monitored data. It uses Group Relative Policy Optimization (GRPO), which improves reasoning by evaluating multiple results, significantly improving benchmark performance. For example, its AIME 2024 pass@1 score increased from 15.6% to 71.0% during training.

2. Multi-Level Training on DeepSeek-R1: DeepSeek-R1 aggregates the first cold data—thousands of selected CoT examples—to tune its base model before tackling inference-based RL. This process ensures that results are consistent and user-friendly by combining language consistency awards.

3. Distillation of small models: To deal with computational constraints, DeepSeek-AI decomposed six submodels (1.5B to 70B parameters) from DeepSeek-R1 using the Qwen and Llama architectures. These models retain strong thinking power, the 14B distilled model scores a pass@1 of 69.7% in AIME 2024, outperforming other major models.

Results: Implications for Practice

The performance of DeepSeek-R1 is supported by the benchmark results:

Consultation rates:
- AIME 2024: 79.8% pass@1, passing OpenAI's O1-mini.
- MATH-500: 97.3% pass@1, compared to OpenAI-o1-1217.
- GPQA Diamond: 71.5% pass@1, excels in fact-based reasoning.
Coding and STEM Careers:
- The Codeforces Elo rating: 2029, is more effective than 96.3 percent of the participants.
- SWE-Bench Confirmed: 49.2% resolution rate, competitive with other leading models.
General Skills:
- Strong generalization was demonstrated in the ArenaHard and AlpacaEval 2.0 benchmarks, achieving win rates of 92.3% and 87.6%, respectively.

Highlights of the Distilled model: Smaller models such as DeepSeek-R1-Distill-Qwen-32B show strong performance, with a pass@1 score of 72.6% in AIME 2024, indicating robustness and effective performance.

Conclusion: Refining Reasoning in AI

DeepSeek-AI's DeepSeek-R1 and DeepSeek-R1-Zero represent a logical advance in the thinking power of LLMs. Using RL, cold start data, and abstraction techniques, these models address important limitations while improving accessibility through open source availability under the MIT License. The API ('model=deepseek-reasoner') also improves usability for developers and researchers.

Looking ahead, DeepSeek-AI plans to improve multilingual support, improve software engineering capabilities, and rapidly improve sensitivity. These efforts aim to further establish DeepSeek-R1 as a robust solution for logic-based AI applications. By combining conceptual training paradigms, DeepSeek-R1 shows how AI can improve in tackling complex challenges.

Check it out Paper, DeepSeek R1 and DeepSeek R1 Zero. All credit for this study goes to the researchers of this project. Also, don't forget to follow us Twitter and join our Telephone station again LinkedIn Grup. Don't forget to join our 65k+ ML SubReddit.

🚨 [Recommended Read] Nebius AI Studio extends with vision models, new language models, embeddings and LoRA ^(Promoted)

Asif Razzaq is the CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, Asif is committed to harnessing the power of Artificial Intelligence for the benefit of society. His latest endeavor is the launch of Artificial Intelligence Media Platform, Marktechpost, which stands out for its extensive coverage of machine learning and deep learning stories that sound technically sound and easily understood by a wide audience. The platform boasts of more than 2 million monthly views, which shows its popularity among viewers.

📄 Meet 'Height': Independent project management tool (Sponsored)

Source link

nimda January 21, 2025

0 8 3 minutes read