Generative AI

Xiaomi launched Memo-7B: Model of the united language that comes from mathematical models and code to consult with pre-strengthening training training

With a growing demand for AIs that can manage the functions that involve a variety of alignment, evidence mathematical, and the development of the software, researchers have turned their attention to the development of the ability to consulting models. This is strong, when it is believed that it is the only person's intelligence, now being actively being thrown in small models that separates them more work and have been broadcast. As practical workers continue to expand, including solving education problems, defense, complex software structure, language models are expected to be more than the variable. They are encouraged to have some problems with problems that can cause experts and investigations.

Another challenge to build up the focus of reasons is strongly, working together in mathematics and programs while storing a small model size. Many competition results are available in models containing 32 million parameters or more. These large species are commonly used because small and rewarding are generally struggling to strengthen learning activities, especially when it comes to resolving problems based on the code. Sprarse response is a reward, high-quality data, and the construction of the weakest model and makes it difficult to develop clear but powerful models. Additionally, information used to train these types are not always suggested in mind consultation, often caused by poor training and species prescribed in resolving problems.

Coping with challenges of consultation, several models, including Openai, Deepseek R1, and Claude 3.7, has been presented, many educational parameters. These models employ strategies such as planning and return to improve thinking, especially in algorithmic thinking and mathematical activities. However, they rely heavily on training stages after training and stay under the importance of high data for pre-training training. Many rely on the Reward Rewards based on template based on the images that are inclined to awaken to ask. Last benches that are often expressing that these models are acting irrelevant to challenging activities due to luxury fundamentals and the efficiency of the reward.

A research team from Xiaomi presented Mimo-7b Language models in the way that focuses on overcome these issues. Innovation lies in the treatment of both training and reporting postponements following the equality of developing thinking skills. Basic model, Mimo-7b-Base, were trained from the beginning to use 25 trillion tokens. This data is made of a three-phase plan program continuously expanding mathematical content and program. Additional Multiple-Token purpose (MTP) is introduced during pre-training to improve both operating speed and measurement speed. With the training behind, the team builds the selected problems of 130,000 programs, each marked in difficulties. Reformation was then used using a reward framework driven by difficulty, allowing more reply and effective time during training. This resulted in a division of two thousand: MIMO-7B-RL and MMO-7B-RL-ZERO.

The previous training began by issuing heavy content from webpages, subject papers, and books using the HTML tool that releases customs designed to store statistics and code snippets. Unlike normal pipes, this saved saved amount is critical in resolving problems of domains. The group then develop PDF Tools for Payer to translate scientific content and planning well. Preventing data repetition, global protection is used using URL and Minhash strategies. Training Corpus is filtered using smaller-based language models to install the content quality mark, instead of filing the best shopping shops that often delete important examples. The quality of quality thinking is also produced from developed models and added to the final stage of training. This three stage method resulted in the final understanding of 70% of the mathematical and coding data on two stage and 10% additional content of the three in three stage. The top length of creation were expanded from 8,192 to 32,288 token, to ensure that the model can manage long thinking problems.

In a confirmed learning phase, a group of research research engines is to speed up training and verification. The wealth included the combination of asynchronous reward and the first proposal processing measures to reduce the GPU time, which leads to 2.29's immediate training and speedy confirmation of 1.96 immediately. The Model Policy was prepared using good rewards received from the assessment of evaluation cases, to address the sparse reward story on the planning bench. Smal sample sample strategies were introduced to maintain training training and increase sample performance. These combined strategy empowered mimo-7B diversity to learn effectively, even in the early cold, where it can be included in the pre-positive establishment.

The performance test revealed that Memo-7b-they had earned 75.2 points in Big-Bench Hard Hard (BBH), passing some 7B 7B models. It also performs well in the super line, including questions to think about graduate degree. MMO-7B-Rus trained post-trained trains 55.4 in AIs 2025 Benchmark, passing O1-mini of Openai in 4,7 points. On Code Generation Tasks, activate the largest models such as Deepsek-R1-zero-32b and QWEN2.5-32b-rl-zero on LiveCodebelch V5 and V6. These results indicate that well-prepared 7B model is a rival or outperform models with four times the number of parameters.

The MIMO-7B project operates as a religious promise that the previous training, data quality, and the learning infrastructure for strengthening to contribute to the Language model. With a pipe thinking from the Data Review, the Xiaomi research team has received clear models but powerful is good for the world's active operations of the Mathematics, codes, and logic. Their path highlights the power of less models and challenges to assume that size alone decides intelligence or variations.

Key Taken from the study with MI-7B:

  1. Mimo-7B was trained in large 25 trillion-token data, aimed at consultation activities through a formal data mix.
  2. 130,000 mathematical problems and RL Training Code, each specified by severe scores to enable rewarding make-up effectively.
  3. The prior phase training is raised statistics and installation codes in 70%, followed by a 10% problem resolution information.
  4. The engine of seamlessly emissions increased at the speed of RL training at 2.29 times and 1.96 times confirmation.
  5. MIMO-7B-RL reached 55.4 in AIM 2025, in OpenClowForm O1-mini on 4.7 points.
  6. MIMO-7B models are public and include all checkpoints: foundation, sft, and RL variation.
  7. Model success indicates that smaller models, designed well can include or exceed 32B models in consultation activities.

Look Page and GitTub page. Also, don't forget to follow Sane and join ours Telegraph station including LinkedIn Grtopic. Don't forget to join ours 90k + ml subreddit.

🔥 [Register Now] Summit of the Minicon Virtual in Agentic AI: Free Registration + Certificate of Before Hour 4 Hour Court (May 21, 9 AM


Nikhil is a student of students in MarktechPost. Pursuing integrated graduates combined in the Indian Institute of Technology, Kharagpur. Nikhl is a UI / ML enthusiasm that searches for applications such as biomoutomostoments and biomedical science. After a solid in the Material Science, he examines new development and developing opportunities to contribute.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button