Big-Math-RL-Verified-Processed
收藏魔搭社区2026-01-09 更新2025-04-26 收录
下载链接:
https://modelscope.cn/datasets/open-r1/Big-Math-RL-Verified-Processed
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for Big-Math-RL-Verified-Processed
This is a processed version of [SynthLabsAI/Big-Math-RL-Verified](https://huggingface.co/datasets/SynthLabsAI/Big-Math-RL-Verified) where we have applied the following filters:
1. Removed samples where `llama8b_solve_rate` is `None`
2. Removed samples that could not be parsed by `math-verify` (empty lists)
We have also created 5 additional subsets to indicate difficulty level, similar to the MATH dataset. To do so, we computed quintiles on the `llama8b_solve_rate` values and then filtered the dataset into the corresponding bins.
The full dataset processing logic can be found in `create_dataset.py`.
If you find this dataset useful in your work, please cite the original source with
```
@misc{albalak2025bigmathlargescalehighqualitymath,
title={Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models},
author={Alon Albalak and Duy Phung and Nathan Lile and Rafael Rafailov and Kanishk Gandhi and Louis Castricato and Anikait Singh and Chase Blagden and Violet Xiang and Dakota Mahan and Nick Haber},
year={2025},
eprint={2502.17387},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2502.17387},
}
```
# Big-Math-RL-Verified-Processed 数据集卡片
本数据集系[SynthLabsAI/Big-Math-RL-Verified](https://huggingface.co/datasets/SynthLabsAI/Big-Math-RL-Verified)的处理后版本,我们对其施加了如下筛选规则:
1. 移除`llama8b_solve_rate`为`None`的样本
2. 移除无法被`math-verify`解析的样本(即空列表样本)
此外,我们参考MATH数据集(MATH Dataset)的设置,新增5个用于标注难度等级的子数据集。具体实现方式为:基于`llama8b_solve_rate`的数值计算五分位数,再将原数据集按对应分箱进行划分。
完整的数据集处理逻辑可在`create_dataset.py`中查看。
若您在研究工作中使用本数据集,请引用如下原始文献:
@misc{albalak2025bigmathlargescalehighqualitymath,
title={Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models},
author={Alon Albalak and Duy Phung and Nathan Lile and Rafael Rafailov and Kanishk Gandhi and Louis Castricato and Anikait Singh and Chase Blagden and Violet Xiang and Dakota Mahan and Nick Haber},
year={2025},
eprint={2502.17387},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2502.17387},
}
提供机构:
maas
创建时间:
2025-04-22



