Guru: Structured Reference Delivery LLM Reasoning in one of the six domains

The limitations of a tightened reading in the view of small lines
The strengthening of learning RL has indicated the powerful power to improve the ability to consult the llMs, especially in leading systems such as Openai-O3 and Deepseek-R1. However, many RL research is focused on mathematical and code, reduces its normal performance. This small figure sets two stories: Our understanding of how the RL enhances thinking may not be more than these domains, and the resulting models are often unique. Expanding the RL of the work proteer is a challenge because of lack of reliable reward and selected details, which is easy to explain in the final consulting names.
Focus on a small domain and common challenges
RL reading strengthening is a popular way to develop llms consultation skills, especially after successful models such as Openai's GPT-3 and Deepseek-R1. Many attempts at an open source has followed it, focusing on Mathematical and Coding domains. While these species do well in their wells, their thinking does not always do a wide job. At the same time, research has been examined how RL influences show each other. Other lessons indicate that RL is not a teacher of new skills but increases the ability of a model to access existing thinking patterns. However, the new job indicates that additional RL training can open a few new consultation strategies.
Presentation of guru dataset: Multi-Domain RL Benchmark
Investigators from UC San Diego, Umbzua, Carnegie Mellon, and Purdue launching Guru, 92 K-Example of RL Data covering six consultation: Mathematics, Code, Science, Suspiation and Table. Each domain is carefully built with the corresponding reward tasks and solid sorting. Guru's training models show that RL results depends on the most domain: Normal domains benefit from the Cross-Domain Domain RL, while strangers need the In-domain training for the development of the in-domain. Their models, guru-7b and guru-32b, open-out Outperform models open up to 7.9% in 17 jobs. These findings emphasize the relevant RL background results and the number of benchmarks, various variety.
Cross-Domain vs. In-Facility is the strengthening of learning outcomes
In order to better understand how RL support is showing each other in the letters, researchers who train models in both models and the database we have mixed from guru data. They found that the figures such as mathematical, code, and science benefited much from reccr-domain rl, likely due to their powerful presence in the previous training. Mixed training of a well-made domain or better than one domain training, indicating that combining various functions can improve normal thinking. However, training is only for difficult examples developed in the domain but reduces the accuracy of the simple functions of others. These findings suggest that moderate data differences are essential for active consultation skills, transmitted transferred.
Guuru Model Architective And Strategy Checkpoint
The study trains 7B models and 32 b models using guru data to check that combining multiple domains during RL improves consultation skills. Using the VERL framework and GRPO algorithm, models tested in various activities, including statistics, code, science, science, simulating tables, using disagree metrics. The results indicate that Guru apperffececececeded ad paredeffece and effectively work in unseen activities. Significantly, PASS Analysis @ k has revealed that performance depends on the type of work, model size, and translation settings. The big models earn a lot from RL parameters, and travel in Tweaking parameters, such as heat and top-K, helping improve model differences and consultation.
Summary: General consultation with guru
In conclusion, Guru is a selected RL dataset that contains high-quality examples 92,000, guaranteed for all six domains six domains: Statistics, Code, Science, and Table. Unlike the previous RL research, focus on statistics and Code, Guru enables widespread learning lessons by providing domain signals. The investigators train two models, Guru-7B and Guru-32b, reaching Kingdom results – the effects of arts in 17 measurement activities, especially in construction sites during pretensions. Their discovery shows the RL you both can analyze existing information and promote new skills. All information, models, and the code are publicly issued to support other general consultation research.
Look Page, project page and GitHub page. All credit for this study goes to research for this project. Also, feel free to follow it Sane and don't forget to join ours 100K + ml subreddit Then sign up for Our newspaper.
Sana Hassan, a contact in MarktechPost with a student of the Dual-degree student in the IIit Madras, loves to use technology and ai to deal with the real challenges of the world. I'm very interested in solving practical problems, brings a new view of ai solution to AI and real solutions.




