mAceReason-Math: A High-Quality Multilingual Math Problem Dataset Ready for RLVR

Reinforcement Learning with Verifiable Rewards (RLVR) has been successfully used to significantly increase the skills of large pre-trained linguistic models, especially in the domains of mathematical and cognitive problems. However, current research and available training datasets remain English-centric. Although multilingual training data and benchmarks have been created in the past, they were not created with the RLVR and current capabilities of the models in mind, and their complexity level is often too low to provide suitable training signals for current models. To address this gap, we provide mAceReason-Math, a dataset of high-quality interpretations of challenging math problems drawn from a corpus selected for RLVR (AceReason-Math). We also take special care to clean and improve our translations, resulting in coverage of 14 languages with over 10,000 samples per language. We extract the dataset to conduct multilingual RLVR research and benchmarking in the research community.
- † Hasso Plattner Institute and ELLIS Unit Potsdam
- ** Work done while at Apple
- ‡ Equal contribution



