five

JWei05/DAPO-OpenMathInstruct2-34k

收藏
Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/JWei05/DAPO-OpenMathInstruct2-34k
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - text-generation language: - en size_categories: - 10K<n<100K tags: - math - reasoning - rl - grpo - dapo --- # DAPO + OpenMathInstruct-2 Mix (34k) A 50/50 mix of two math-reasoning datasets used for RL training of Gemma 3 PT models with DAPO (GRPO). ## Composition | Source | Rows | Description | |--------|------|-------------| | [`open-r1/DAPO-Math-17k-Processed`](https://huggingface.co/datasets/open-r1/DAPO-Math-17k-Processed) | 17,398 | DAPO training set (AoPS + competition math) | | [`nvidia/OpenMathInstruct-2`](https://huggingface.co/datasets/nvidia/OpenMathInstruct-2) subset | 17,398 | Synthetic augmented math problems | | **Total** | **34,796** | | Within the OpenMathInstruct-2 subset: - 14,529 `augmented_math` (competition-style augmentations) - 2,372 `augmented_gsm8k` (grade school augmentations) - 248 `math` (original MATH) - 249 `gsm8k` (original GSM8K) ## Schema (verl-compatible) Each row is formatted for [verl](https://github.com/volcengine/verl) RL training: ```python { "data_source": "math", "prompt": [ {"content": "Problem... Please output the final answer within \\boxed{}.", "role": "user"} ], "reward_model": {"ground_truth": "42", "style": "rule"}, "extra_info": { "index": "openmath2-972", # or UUID for DAPO-Math rows "original_question": "...", "problem_source": "augmented_math", "split": "train" } } ``` - `data_source=math` routes to `math_verify` grading in verl. - `prompt` is ready for chat-template formatting. - `ground_truth` has been extracted from the solution text (answer after `#### ` for GSM8K-style, or final answer expression). ## Usage ```python from datasets import load_dataset ds = load_dataset("JWei05/DAPO+OpenMathInstruct2-34k", split="train") ``` ## Reproducing See `rl-distill/dapo/data/make_dapo_openmath2_mix.py` (part of the training pipeline). ## Citation If you use this dataset, please cite the source datasets: ``` @article{yu2025dapo, title={DAPO: An Open-Source LLM Reinforcement Learning System at Scale}, author={Yu, Qiying and others}, journal={arXiv preprint arXiv:2503.14476}, year={2025} } @article{toshniwal2024openmathinstruct, title={OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data}, author={Toshniwal, Shubham and others}, journal={arXiv preprint arXiv:2410.01560}, year={2024} } ```

许可证:MIT协议 任务类别:文本生成 语言:英语 数据规模:10000 < 样本量 < 100000 标签:数学、推理、强化学习(Reinforcement Learning, RL)、GRPO、DAPO # DAPO + OpenMathInstruct-2 混合数据集(34k条) 本数据集为两份数学推理数据集的50/50等比例混合版本,用于借助DAPO(GRPO)对Gemma 3 PT模型开展强化学习训练。 ## 数据集构成 | 数据源 | 数据条数 | 描述 | |--------|------|-------------| | [`open-r1/DAPO-Math-17k-Processed`](https://huggingface.co/datasets/open-r1/DAPO-Math-17k-Processed) | 17,398 | DAPO训练集(涵盖Art of Problem Solving, AoPS与竞赛数学内容) | | [`nvidia/OpenMathInstruct-2`](https://huggingface.co/datasets/nvidia/OpenMathInstruct-2) 子集 | 17,398 | 合成增强型数学题目 | | **总计** | **34,796** | | 在OpenMathInstruct-2子集中,包含以下细分类型: - 14,529条`augmented_math`(竞赛风格增强数学题目) - 2,372条`augmented_gsm8k`(中小学阶段增强数学题目) - 248条`math`(原始MATH数据集题目) - 249条`gsm8k`(原始GSM8K数据集题目) ## 适配verl的数据集格式 每条数据均针对[verl](https://github.com/volcengine/verl)强化学习训练进行格式化,示例如下: python { "data_source": "math", "prompt": [ {"content": "Problem... Please output the final answer within \boxed{}.", "role": "user"} ], "reward_model": {"ground_truth": "42", "style": "rule"}, "extra_info": { "index": "openmath2-972", # or UUID for DAPO-Math rows "original_question": "...", "problem_source": "augmented_math", "split": "train" } } - 当`data_source=math`时,会指向verl框架中的`math_verify`评分模块。 - `prompt`字段已适配对话模板的格式化要求。 - `ground_truth`字段已从解答文本中提取:对于GSM8K风格题目,提取`#### `标记后的答案;对于其他题目则提取最终答案表达式。 ## 使用方法 可通过如下代码加载训练拆分的数据集: python from datasets import load_dataset ds = load_dataset("JWei05/DAPO+OpenMathInstruct2-34k", split="train") ## 复现方式 请参考训练流程中的`rl-distill/dapo/data/make_dapo_openmath2_mix.py`脚本进行数据集复现。 ## 引用规范 若使用本数据集,请引用其源数据集的相关学术论文: @article{yu2025dapo, title={DAPO: An Open-Source LLM Reinforcement Learning System at Scale}, author={Yu, Qiying and others}, journal={arXiv preprint arXiv:2503.14476}, year={2025} } @article{toshniwal2024openmathinstruct, title={OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data}, author={Toshniwal, Shubham and others}, journal={arXiv preprint arXiv:2410.01560}, year={2024} }
提供机构:
JWei05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作