JWei05/DAPO-OpenMathInstruct2-34k
收藏Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/JWei05/DAPO-OpenMathInstruct2-34k
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- text-generation
language:
- en
size_categories:
- 10K<n<100K
tags:
- math
- reasoning
- rl
- grpo
- dapo
---
# DAPO + OpenMathInstruct-2 Mix (34k)
A 50/50 mix of two math-reasoning datasets used for RL training of Gemma 3 PT models with DAPO (GRPO).
## Composition
| Source | Rows | Description |
|--------|------|-------------|
| [`open-r1/DAPO-Math-17k-Processed`](https://huggingface.co/datasets/open-r1/DAPO-Math-17k-Processed) | 17,398 | DAPO training set (AoPS + competition math) |
| [`nvidia/OpenMathInstruct-2`](https://huggingface.co/datasets/nvidia/OpenMathInstruct-2) subset | 17,398 | Synthetic augmented math problems |
| **Total** | **34,796** | |
Within the OpenMathInstruct-2 subset:
- 14,529 `augmented_math` (competition-style augmentations)
- 2,372 `augmented_gsm8k` (grade school augmentations)
- 248 `math` (original MATH)
- 249 `gsm8k` (original GSM8K)
## Schema (verl-compatible)
Each row is formatted for [verl](https://github.com/volcengine/verl) RL training:
```python
{
"data_source": "math",
"prompt": [
{"content": "Problem... Please output the final answer within \\boxed{}.", "role": "user"}
],
"reward_model": {"ground_truth": "42", "style": "rule"},
"extra_info": {
"index": "openmath2-972", # or UUID for DAPO-Math rows
"original_question": "...",
"problem_source": "augmented_math",
"split": "train"
}
}
```
- `data_source=math` routes to `math_verify` grading in verl.
- `prompt` is ready for chat-template formatting.
- `ground_truth` has been extracted from the solution text (answer after `#### ` for GSM8K-style, or final answer expression).
## Usage
```python
from datasets import load_dataset
ds = load_dataset("JWei05/DAPO+OpenMathInstruct2-34k", split="train")
```
## Reproducing
See `rl-distill/dapo/data/make_dapo_openmath2_mix.py` (part of the training pipeline).
## Citation
If you use this dataset, please cite the source datasets:
```
@article{yu2025dapo,
title={DAPO: An Open-Source LLM Reinforcement Learning System at Scale},
author={Yu, Qiying and others},
journal={arXiv preprint arXiv:2503.14476},
year={2025}
}
@article{toshniwal2024openmathinstruct,
title={OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data},
author={Toshniwal, Shubham and others},
journal={arXiv preprint arXiv:2410.01560},
year={2024}
}
```
许可证:MIT协议
任务类别:文本生成
语言:英语
数据规模:10000 < 样本量 < 100000
标签:数学、推理、强化学习(Reinforcement Learning, RL)、GRPO、DAPO
# DAPO + OpenMathInstruct-2 混合数据集(34k条)
本数据集为两份数学推理数据集的50/50等比例混合版本,用于借助DAPO(GRPO)对Gemma 3 PT模型开展强化学习训练。
## 数据集构成
| 数据源 | 数据条数 | 描述 |
|--------|------|-------------|
| [`open-r1/DAPO-Math-17k-Processed`](https://huggingface.co/datasets/open-r1/DAPO-Math-17k-Processed) | 17,398 | DAPO训练集(涵盖Art of Problem Solving, AoPS与竞赛数学内容) |
| [`nvidia/OpenMathInstruct-2`](https://huggingface.co/datasets/nvidia/OpenMathInstruct-2) 子集 | 17,398 | 合成增强型数学题目 |
| **总计** | **34,796** | |
在OpenMathInstruct-2子集中,包含以下细分类型:
- 14,529条`augmented_math`(竞赛风格增强数学题目)
- 2,372条`augmented_gsm8k`(中小学阶段增强数学题目)
- 248条`math`(原始MATH数据集题目)
- 249条`gsm8k`(原始GSM8K数据集题目)
## 适配verl的数据集格式
每条数据均针对[verl](https://github.com/volcengine/verl)强化学习训练进行格式化,示例如下:
python
{
"data_source": "math",
"prompt": [
{"content": "Problem... Please output the final answer within \boxed{}.", "role": "user"}
],
"reward_model": {"ground_truth": "42", "style": "rule"},
"extra_info": {
"index": "openmath2-972", # or UUID for DAPO-Math rows
"original_question": "...",
"problem_source": "augmented_math",
"split": "train"
}
}
- 当`data_source=math`时,会指向verl框架中的`math_verify`评分模块。
- `prompt`字段已适配对话模板的格式化要求。
- `ground_truth`字段已从解答文本中提取:对于GSM8K风格题目,提取`#### `标记后的答案;对于其他题目则提取最终答案表达式。
## 使用方法
可通过如下代码加载训练拆分的数据集:
python
from datasets import load_dataset
ds = load_dataset("JWei05/DAPO+OpenMathInstruct2-34k", split="train")
## 复现方式
请参考训练流程中的`rl-distill/dapo/data/make_dapo_openmath2_mix.py`脚本进行数据集复现。
## 引用规范
若使用本数据集,请引用其源数据集的相关学术论文:
@article{yu2025dapo,
title={DAPO: An Open-Source LLM Reinforcement Learning System at Scale},
author={Yu, Qiying and others},
journal={arXiv preprint arXiv:2503.14476},
year={2025}
}
@article{toshniwal2024openmathinstruct,
title={OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data},
author={Toshniwal, Shubham and others},
journal={arXiv preprint arXiv:2410.01560},
year={2024}
}
提供机构:
JWei05



