Reactive Machines

mAceReason-Math: A High-Quality Multilingual Math Problem Dataset Ready for RLVR

Reinforcement Learning with Verifiable Rewards (RLVR) has been successfully used to significantly increase the skills of large pre-trained linguistic models, especially in the domains of mathematical and cognitive problems. However, current research and available training datasets remain English-centric. Although multilingual training data and benchmarks have been created in the past, they were not created with the RLVR and current capabilities of the models in mind, and their complexity level is often too low to provide suitable training signals for current models. To address this gap, we provide mAceReason-Math, a dataset of high-quality interpretations of challenging math problems drawn from a corpus selected for RLVR (AceReason-Math). We also take special care to clean and improve our translations, resulting in coverage of 14 languages ​​with over 10,000 samples per language. We extract the dataset to conduct multilingual RLVR research and benchmarking in the research community.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button