JWei05/DAPO-OpenMathInstruct2-34k

Name: JWei05/DAPO-OpenMathInstruct2-34k
Creator: JWei05
Published: 2026-04-20 21:58:22
License: 暂无描述

Hugging Face2026-04-20 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/JWei05/DAPO-OpenMathInstruct2-34k

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - text-generation language: - en size_categories: - 10K<n<100K tags: - math - reasoning - rl - grpo - dapo --- # DAPO + OpenMathInstruct-2 Mix (34k) A 50/50 mix of two math-reasoning datasets used for RL training of Gemma 3 PT models with DAPO (GRPO). ## Composition | Source | Rows | Description | |--------|------|-------------| | [`open-r1/DAPO-Math-17k-Processed`](https://huggingface.co/datasets/open-r1/DAPO-Math-17k-Processed) | 17,398 | DAPO training set (AoPS + competition math) | | [`nvidia/OpenMathInstruct-2`](https://huggingface.co/datasets/nvidia/OpenMathInstruct-2) subset | 17,398 | Synthetic augmented math problems | | **Total** | **34,796** | | Within the OpenMathInstruct-2 subset: - 14,529 `augmented_math` (competition-style augmentations) - 2,372 `augmented_gsm8k` (grade school augmentations) - 248 `math` (original MATH) - 249 `gsm8k` (original GSM8K) ## Schema (verl-compatible) Each row is formatted for [verl](https://github.com/volcengine/verl) RL training: ```python { "data_source": "math", "prompt": [ {"content": "Problem... Please output the final answer within \\boxed{}.", "role": "user"} ], "reward_model": {"ground_truth": "42", "style": "rule"}, "extra_info": { "index": "openmath2-972", # or UUID for DAPO-Math rows "original_question": "...", "problem_source": "augmented_math", "split": "train" } } ``` - `data_source=math` routes to `math_verify` grading in verl. - `prompt` is ready for chat-template formatting. - `ground_truth` has been extracted from the solution text (answer after `#### ` for GSM8K-style, or final answer expression). ## Usage ```python from datasets import load_dataset ds = load_dataset("JWei05/DAPO+OpenMathInstruct2-34k", split="train") ``` ## Reproducing See `rl-distill/dapo/data/make_dapo_openmath2_mix.py` (part of the training pipeline). ## Citation If you use this dataset, please cite the source datasets: ``` @article{yu2025dapo, title={DAPO: An Open-Source LLM Reinforcement Learning System at Scale}, author={Yu, Qiying and others}, journal={arXiv preprint arXiv:2503.14476}, year={2025} } @article{toshniwal2024openmathinstruct, title={OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data}, author={Toshniwal, Shubham and others}, journal={arXiv preprint arXiv:2410.01560}, year={2024} } ```

许可证：MIT协议任务类别：文本生成语言：英语数据规模：10000 < 样本量 < 100000 标签：数学、推理、强化学习（Reinforcement Learning, RL）、GRPO、DAPO # DAPO + OpenMathInstruct-2 混合数据集（34k条）本数据集为两份数学推理数据集的50/50等比例混合版本，用于借助DAPO（GRPO）对Gemma 3 PT模型开展强化学习训练。 ## 数据集构成 | 数据源 | 数据条数 | 描述 | |--------|------|-------------| | [`open-r1/DAPO-Math-17k-Processed`](https://huggingface.co/datasets/open-r1/DAPO-Math-17k-Processed) | 17,398 | DAPO训练集（涵盖Art of Problem Solving, AoPS与竞赛数学内容） | | [`nvidia/OpenMathInstruct-2`](https://huggingface.co/datasets/nvidia/OpenMathInstruct-2) 子集 | 17,398 | 合成增强型数学题目 | | **总计** | **34,796** | | 在OpenMathInstruct-2子集中，包含以下细分类型： - 14,529条`augmented_math`（竞赛风格增强数学题目） - 2,372条`augmented_gsm8k`（中小学阶段增强数学题目） - 248条`math`（原始MATH数据集题目） - 249条`gsm8k`（原始GSM8K数据集题目） ## 适配verl的数据集格式每条数据均针对[verl](https://github.com/volcengine/verl)强化学习训练进行格式化，示例如下： python { "data_source": "math", "prompt": [ {"content": "Problem... Please output the final answer within \boxed{}.", "role": "user"} ], "reward_model": {"ground_truth": "42", "style": "rule"}, "extra_info": { "index": "openmath2-972", # or UUID for DAPO-Math rows "original_question": "...", "problem_source": "augmented_math", "split": "train" } } - 当`data_source=math`时，会指向verl框架中的`math_verify`评分模块。 - `prompt`字段已适配对话模板的格式化要求。 - `ground_truth`字段已从解答文本中提取：对于GSM8K风格题目，提取`#### `标记后的答案；对于其他题目则提取最终答案表达式。 ## 使用方法可通过如下代码加载训练拆分的数据集： python from datasets import load_dataset ds = load_dataset("JWei05/DAPO+OpenMathInstruct2-34k", split="train") ## 复现方式请参考训练流程中的`rl-distill/dapo/data/make_dapo_openmath2_mix.py`脚本进行数据集复现。 ## 引用规范若使用本数据集，请引用其源数据集的相关学术论文： @article{yu2025dapo, title={DAPO: An Open-Source LLM Reinforcement Learning System at Scale}, author={Yu, Qiying and others}, journal={arXiv preprint arXiv:2503.14476}, year={2025} } @article{toshniwal2024openmathinstruct, title={OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data}, author={Toshniwal, Shubham and others}, journal={arXiv preprint arXiv:2410.01560}, year={2024} }

提供机构：

JWei05

5,000+

优质数据集

54 个

任务类型

进入经典数据集