Revisual-R1: An open Source of the 7B Multimodal Great Lilide Model (MLLMs) that reaches long, accurate thinking and thought

The challenge of multimodal thinking
Recent cracks in text models based on the text, such as Deepseek-R1, indicated that RL can help develop strong consultation skills. Motivated by this, researchers tried to use the same RL strategies in MLLMs to improve their ability to consult both input and writing. However, these efforts were not fully successfully; Mllms are born with complex consultation activities. This suggests that simply use RL strategies from the text models may not work properly in multimolor settings, where the partnership between different data types present new challenges that need appropriate methods.
The evolution of multimodal languages
The latest research on MLLMs develops the development of llms by combining visual inputs. The first models, such as clip and minigpt-4, lays the foundation, followed by the models organized as LLAMA. While closed models of the source indicate solid thinking by the outcome of long effects, open models focused on good changes and conversion of the cot. However, this often identifies short answers with a deep depth. RL, including strategies such as RLHF and GRPO, has proven to promise to improve thinking about llms. Inspired by this, the latest work now intends to use RL to MLLMs to improve the wealthy, outgoing support.
Recovery-R1 introduction
Investigators from Tsinghua University, Shanghai Jiaa Tong University, as well as the Shanghai Lienic intelligence intelligence Their study reveals important insight: (1) Carefully text-endedraing gives the cold starting, more than many MLLMs exist even before RL; . and (3) add the final version of RL-only RL the RL after Multimodal RL development. Their three stage method, including hypocritical text, RSL of multimodal, and the final text RL, beats a valid balance between visual recognition and deep reasoning to think.
To develop language dataset
Grammar dataset was developed after it appeared that existing datasets existing multimodal datimodals that contain the depth needed to train solid thinking models. Scriptures – Only in Daseets, such as Deepmath, showed better gain from the Scriptures and multimodal activities, suggesting that complex difficulties are better remedied. Dealing with this, grammar includes various samples of text and multimodal using the multi-stage process. This data removes the stiffness framework (of SRO) made of stronger
A pipe with three stage
R1 review process follows the process of three stage training training Tested across different benchmarks and passed completely different from the open source and certain commercial models in multilodals and consultation activities. The model receives high results in 9 of 10 benches. Condes of cleaning confirmed the importance of the training planning and the capacity of the self-esteem, which has helped to focus on high quality responses, leading to significant development in full performance.
Summary and Contributions
In conclusion, renewed-R1 is an open source of an open 7B that has built the challenges of multimedal consultation. Instead of reliance on a scale, it uses a high-scale stage process: Starting the highest data of high-quality mind level, followed by a new Multimodal RL procedure for action, and ends with the written RL. This Complexity increases efficiency. Revisual-R1 sets a new bench between 7B models, expressing functions such as the Mathverse and AIME. The work highlights how the fixed training can open a deep thinking of MLMMS.
Look Paper and the GitHub page. All credit for this study goes to research for this project. Also, feel free to follow it Sane and don't forget to join ours 100K + ml subreddit Then sign up for Our newspaper.
Sana Hassan, a contact in MarktechPost with a student of the Dual-degree student in the IIit Madras, loves to use technology and ai to deal with the real challenges of the world. I'm very interested in solving practical problems, brings a new view of ai solution to AI and real solutions.




