DeepMath-103K
收藏魔搭社区2026-05-16 更新2025-04-19 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/DeepMath-103K
下载链接
链接失效反馈官方服务:
资源简介:
# DeepMath-103K
<table>
<tr>
<td style="padding: 0;">
<a href="https://huggingface.co/datasets/zwhe99/DeepMath-103K">
<img src="https://img.shields.io/badge/Data-4d5eff?style=for-the-badge&logo=huggingface&logoColor=ffffff&labelColor" alt="Data">
</a>
</td>
<td style="padding: 0;">
<a href="https://huggingface.co/collections/zwhe99/deepmath-6816e139b7f467f21a459a9a">
<img src="https://img.shields.io/badge/Model-4d5eff?style=for-the-badge&logo=huggingface&logoColor=ffffff&labelColor" alt="Data">
</a>
</td>
<td style="padding: 0;">
<a href="https://github.com/zwhe99/DeepMath">
<img src="https://img.shields.io/badge/Code-000000?style=for-the-badge&logo=github&logoColor=white" alt="Code">
</a>
</td>
<td style="padding: 0;">
<a href="https://arxiv.org/abs/2504.11456">
<img src="https://img.shields.io/badge/arXiv-2504.11456-b31b1b.svg?style=for-the-badge" alt="arXiv">
</a>
</td>
</tr>
</table>
## 🔥 News
- **May 8, 2025**: We found that 48 samples contained hints that revealed the answers. The relevant questions have now been revised to remove the leaked answers.
- **April 14, 2025**: We release **`DeepMath-103K`**, a large-scale dataset featuring challenging, verifiable, and decontaminated math problems tailored for RL and SFT. We open source:
## 📦 Resource
- 🤗 Training data: [`DeepMath-103K`](https://huggingface.co/datasets/zwhe99/DeepMath-103K)
- 🤗 Model weights: [`DeepMath-Zero-7B`](https://huggingface.co/zwhe99/DeepMath-Zero-7B), [`DeepMath-Zero-Math-7B`](https://huggingface.co/zwhe99/DeepMath-Zero-Math-7B), [`DeepMath-1.5B`](https://huggingface.co/zwhe99/DeepMath-1.5B), [`DeepMath-Omn-1.5B`](https://huggingface.co/zwhe99/DeepMath-Omn-1.5B)
- 💻 Code: [`DeepMath`](https://github.com/zwhe99/DeepMath)
- 📝 Paper: [`arXiv:2504.11456`](https://arxiv.org/abs/2504.11456)
## 📖 Overview
**`DeepMath-103K`** is meticulously curated to push the boundaries of mathematical reasoning in language models. Key features include:
**1. Challenging Problems**: DeepMath-103K has a strong focus on difficult mathematical problems (primarily Levels 5-9), significantly raising the complexity bar compared to many existing open datasets.
<div align="center"> <img src="./assets/github-difficulty.png" width="600"/>
<sub>Difficulty distribution comparison.</sub> </div>
**2. Data Diversity and Novelty**: DeepMath-103K spans a wide spectrum of mathematical subjects, including Algebra, Calculus, Number Theory, Geometry, Probability, and Discrete Mathematics.
<div align="center"> <img src="./assets/github-domain.png" width="400"/>
<sub>Hierarchical breakdown of mathematical topics covered in DeepMath-103K.</sub></div>
The problems in DeepMath-103K are novel and unique, whereas many existing datasets are similar and overlap.
<div align="center"> <img src="./assets/github-tsne.png" width="600"/>
<sub>Embedding distributions of different datasets.</sub></div>
**3. Rigorous Decontamination**: Built from diverse sources, DeepMath-103K underwent meticulous decontamination against common benchmarks using semantic matching. This minimizes test set leakage and promotes fair model evaluation.
<div align="center"> <img src="./assets/github-contamination-case.png" width="600"/>
<sub>Detected contamination examples. Subtle conceptual overlaps can also be identified.</sub> </div>
**4. Rich Data Format**: Each sample in DeepMath-103K is structured with rich information to support various research applications:
<div align="center"> <img src="./assets/github-data-sample.png" width="600"/>
<sub>An example data sample from DeepMath-103K.</sub> </div>
- **Question**: The mathematical problem statement.
- **Final Answer**: A reliably verifiable final answer, enabling robust rule-based reward functions for RL.
- **Difficulty**: A numerical score for difficulty-aware training or analysis.
- **Topic**: Hierarchical classification for topic-specific applications.
- **R1 Solutions**: Three distinct reasoning paths from DeepSeek-R1, valuable for supervised fine-tuning (SFT) or knowledge distillation.
## 📊Main Results
DeepMath serise models achieve many **SOTA** results on challenging math benchmarks:
<div align="center"> <img src="./assets/github-main.png" width="100%"/>
<sub>Math reasoning performance.</sub> </div>
## 🎯Quick Start
#### Environment Preparation
```shell
git clone --recurse-submodules https://github.com/zwhe99/DeepMath.git && cd DeepMath
conda create -y -n deepmath python=3.12.2 && conda activate deepmath
pip3 install ray[default]
pip3 install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
pip3 install flash-attn==2.7.4.post1 --no-build-isolation
pip3 install omegaconf==2.4.0.dev3 hydra-core==1.4.0.dev1 antlr4-python3-runtime==4.11.0 vllm==0.7.3
pip3 install math-verify[antlr4_11_0]==0.7.0 fire deepspeed tensorboardX prettytable datasets transformers==4.49.0
pip3 install -e verl
```
#### Evaluation
```shell
VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 VLLM_ATTENTION_BACKEND=XFORMERS VLLM_USE_V1=1 VLLM_WORKER_MULTIPROC_METHOD=spawn python3 uni_eval.py \
--base_model zwhe99/DeepMath-Zero-7B \
--chat_template_name orz \
--system_prompt_name simplerl \
--output_dir \
--bf16 True \
--tensor_parallel_size 8 \
--data_id zwhe99/MATH \
--split math500 \
--max_model_len 32768 \
--temperature 0.6 \
--top_p 0.95 \
--n 16
```
#### Training
* Data Preparation
```shell
DATA_DIR=/path/to/your/data
python3 verl/examples/data_preprocess/deepmath_103k.py --local_dir $DATA_DIR
```
* Start Ray
```shell
# Head node (×1)
ray start --head --port=6379 --node-ip-address=$HEAD_ADDR --num-gpus=8
# Worker nodes (×7 or ×11)
ray start --address=$HEAD_ADDR:6379 --node-ip-address=$WORKER_ADDR --num-gpus=8
```
* Launch training at head node. See `scripts/train` for training scripts.
## 🙏 Acknowledgements
This work can not be done without the help of the following works:
- **[verl](https://github.com/volcengine/verl)**: A very fast reinforcement learning framework.
- **[Vivacem/MMIQC](https://huggingface.co/datasets/Vivacem/MMIQC)**: A mixture of question-response pairs extracted from Mathematics Stack Exchange pages.
- **[TIGER-Lab/WebInstructSub](https://huggingface.co/datasets/TIGER-Lab/WebInstructSub)**: Instruction data from MathStackExchange and ScienceStackExchange.
- **[AI-MO/NuminaMath-CoT](https://huggingface.co/datasets/AI-MO/NuminaMath-CoT)**: Approximately 860k math problems.
## 📚 Citation
```bibtex
@article{deepmath,
title={DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning},
author={He, Zhiwei and Liang, Tian and Xu, Jiahao and Liu, Qiuzhi and Chen, Xingyu and Wang, Yue and Song, Linfeng and Yu, Dian and Liang, Zhenwen and Wang, Wenxuan and Zhang, Zhuosheng and Wang, Rui and Tu, Zhaopeng and Mi, Haitao and Yu, Dong},
year={2025},
eprint={2504.11456},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2504.11456},
}
```
# DeepMath-103K
<table>
<tr>
<td style="padding: 0;">
<a href="https://huggingface.co/datasets/zwhe99/DeepMath-103K">
<img src="https://img.shields.io/badge/Data-4d5eff?style=for-the-badge&logo=huggingface&logoColor=ffffff&labelColor" alt="Data">
</a>
</td>
<td style="padding: 0;">
<a href="https://huggingface.co/collections/zwhe99/deepmath-6816e139b7f467f21a459a9a">
<img src="https://img.shields.io/badge/Model-4d5eff?style=for-the-badge&logo=huggingface&logoColor=ffffff&labelColor" alt="Model">
</a>
</td>
<td style="padding: 0;">
<a href="https://github.com/zwhe99/DeepMath">
<img src="https://img.shields.io/badge/Code-000000?style=for-the-badge&logo=github&logoColor=white" alt="Code">
</a>
</td>
<td style="padding: 0;">
<a href="https://arxiv.org/abs/2504.11456">
<img src="https://img.shields.io/badge/arXiv-2504.11456-b31b1b.svg?style=for-the-badge" alt="arXiv">
</a>
</td>
</tr>
</table>
## 🔥 最新动态
- **2025年5月8日**:我们发现有48个样本包含泄露答案的提示信息,现已修订相关问题以移除泄露的答案。
- **2025年4月14日**:我们发布**`DeepMath-103K`**,这是一个大规模数据集,专为强化学习(Reinforcement Learning, RL)和监督微调(Supervised Fine-Tuning, SFT)打造,包含具有挑战性、可验证且经过去污染处理的数学题目。我们同步开源了:
## 📦 资源汇总
- 🤗 训练数据集:[`DeepMath-103K`](https://huggingface.co/datasets/zwhe99/DeepMath-103K)
- 🤗 模型权重:[`DeepMath-Zero-7B`](https://huggingface.co/zwhe99/DeepMath-Zero-7B)、[`DeepMath-Zero-Math-7B`](https://huggingface.co/zwhe99/DeepMath-Zero-Math-7B)、[`DeepMath-1.5B`](https://huggingface.co/zwhe99/DeepMath-1.5B)、[`DeepMath-Omn-1.5B`](https://huggingface.co/zwhe99/DeepMath-Omn-1.5B)
- 💻 代码仓库:[`DeepMath`](https://github.com/zwhe99/DeepMath)
- 📝 研究论文:[`arXiv:2504.11456`](https://arxiv.org/abs/2504.11456)
## 📖 数据集概览
**`DeepMath-103K`** 经过精心甄选构建,旨在突破语言模型数学推理能力的边界。其核心特性如下:
**1. 高难度题目**:DeepMath-103K 聚焦于高难度数学问题(主要为5-9级难度),相较于多数现有开源数据集,显著提升了题目复杂度门槛。
<div align="center"> <img src="./assets/github-difficulty.png" width="600"/>
<sub>难度分布对比图。</sub> </div>
**2. 数据多样性与新颖性**:DeepMath-103K 覆盖广泛的数学学科领域,包括代数、微积分、数论、几何、概率论与离散数学。
<div align="center"> <img src="./assets/github-domain.png" width="400"/>
<sub>DeepMath-103K 覆盖数学主题的层级分类分布。</sub></div>
本数据集的题目均为原创独特,而多数现有数据集存在内容相似与重叠的问题。
<div align="center"> <img src="./assets/github-tsne.png" width="600"/>
<sub>不同数据集的嵌入分布对比。</sub></div>
**3. 严格去污染处理**:尽管数据集源自多样来源,但 DeepMath-103K 通过语义匹配方法对主流基准数据集进行了细致的去污染处理,最大限度降低了测试集泄露风险,保障模型评估的公平性。
<div align="center"> <img src="./assets/github-contamination-case.png" width="600"/>
<sub>检测到的污染示例,可识别出细微的概念重叠。</sub> </div>
**4. 丰富的数据格式**:DeepMath-103K 的每个样本均包含丰富的结构化信息,可支撑各类研究应用:
<div align="center"> <img src="./assets/github-data-sample.png" width="600"/>
<sub>DeepMath-103K 的示例数据样本。</sub> </div>
- **问题(Question)**:数学题目的题干描述。
- **最终答案(Final Answer)**:可可靠验证的标准答案,可为强化学习提供鲁棒的基于规则的奖励函数。
- **难度等级(Difficulty)**:用于感知难度的训练或分析的数值评分。
- **主题分类(Topic)**:面向特定主题应用的层级分类标签。
- **R1 推理路径(R1 Solutions)**:来自 DeepSeek-R1 的三种差异化推理路径,可用于监督微调(SFT)或知识蒸馏。
## 📊 主要实验结果
DeepMath 系列模型在高难度数学基准测试中取得多项**SOTA(当前最优,State-of-the-art)**结果:
<div align="center"> <img src="./assets/github-main.png" width="100%"/>
<sub>数学推理性能对比。</sub> </div>
## 🎯 快速上手
#### 环境配置
shell
git clone --recurse-submodules https://github.com/zwhe99/DeepMath.git && cd DeepMath
conda create -y -n deepmath python=3.12.2 && conda activate deepmath
pip3 install ray[default]
pip3 install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
pip3 install flash-attn==2.7.4.post1 --no-build-isolation
pip3 install omegaconf==2.4.0.dev3 hydra-core==1.4.0.dev1 antlr4-python3-runtime==4.11.0 vllm==0.7.3
pip3 install math-verify[antlr4_11_0]==0.7.0 fire deepspeed tensorboardX prettytable datasets transformers==4.49.0
pip3 install -e verl
#### 模型评估
shell
VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 VLLM_ATTENTION_BACKEND=XFORMERS VLLM_USE_V1=1 VLLM_WORKER_MULTIPROC_METHOD=spawn python3 uni_eval.py
--base_model zwhe99/DeepMath-Zero-7B
--chat_template_name orz
--system_prompt_name simplerl
--output_dir
--bf16 True
--tensor_parallel_size 8
--data_id zwhe99/MATH
--split math500
--max_model_len 32768
--temperature 0.6
--top_p 0.95
--n 16
#### 模型训练
* 数据预处理
shell
DATA_DIR=/path/to/your/data
python3 verl/examples/data_preprocess/deepmath_103k.py --local_dir $DATA_DIR
* 启动 Ray 集群
shell
# 头节点(1个)
ray start --head --port=6379 --node-ip-address=$HEAD_ADDR --num-gpus=8
# 工作节点(7个或11个)
ray start --address=$HEAD_ADDR:6379 --node-ip-address=$WORKER_ADDR --num-gpus=8
* 在头节点启动训练,详细训练脚本请参考 `scripts/train` 目录。
## 🙏 致谢
本研究的完成离不开以下开源项目的支持:
- **[verl](https://github.com/volcengine/verl)**:一款高性能强化学习框架。
- **[Vivacem/MMIQC](https://huggingface.co/datasets/Vivacem/MMIQC)**:从 Mathematics Stack Exchange 页面提取的问答对混合数据集。
- **[TIGER-Lab/WebInstructSub](https://huggingface.co/datasets/TIGER-Lab/WebInstructSub)**:源自 MathStackExchange 与 ScienceStackExchange 的指令数据集。
- **[AI-MO/NuminaMath-CoT](https://huggingface.co/datasets/AI-MO/NuminaMath-CoT)**:包含约86万个数学题目的数据集。
## 📚 引用格式
bibtex
@article{deepmath,
title={DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning},
author={He, Zhiwei and Liang, Tian and Xu, Jiahao and Liu, Qiuzhi and Chen, Xingyu and Wang, Yue and Song, Linfeng and Yu, Dian and Liang, Zhenwen and Wang, Wenxuan and Zhang, Zhuosheng and Wang, Rui and Tu, Zhaopeng and Mi, Haitao and Yu, Dong},
year={2025},
eprint={2504.11456},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2504.11456},
}
提供机构:
maas
创建时间:
2025-04-17



