AceMath-Instruct-Training-Data
收藏魔搭社区2026-01-02 更新2025-01-25 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/AceMath-Instruct-Training-Data
下载链接
链接失效反馈官方服务:
资源简介:
[website](https://research.nvidia.com/labs/adlr/acemath/) | [paper](https://arxiv.org/abs/2412.15084)
## AceMath-Instruct Training Data Card
We release all the datasets to train AceMath-1.5B/7B/72B-Instruct models. These models are built upon the Qwen2.5-Math-Base models through a multi-stage supervised fine-tuning (SFT) process. The fine-tuning begins with general-purpose SFT data (general_sft_stage1.parquet and general_sft_stage2.parquet) and is followed by math-specific SFT data (math_sft.parquet). In our experiments, fine-tuning the Qwen2.5-Math-Base models using only the math-specific SFT data also delivers competitive math reasoning performance.
AceMath-Instruct training datasets are <b>Built with Qwen</b> with math prompt outputs generated by Qwen2.5-Math-72B-Instruct. Outputs for other types of prompts are generated using GPT-4o-mini.
Here are the data statistics:
- general_sft_stage1: 2,261,687 samples (consisting of code & math SFT samples)
- general_sft_stage2: 1,634,573 samples (consisting of code & math & general-domain SFT samples)
- math_sft: 1,661,094 (only math SFT samples)
## Benchmark Results (AceMath-Instruct + AceMath-72B-RM)
<p align="center">
<img src="./acemath-pic.png" alt="AceMath Benchmark Results" width="800">
</p>
We compare AceMath to leading proprietary and open-access math models in above Table. Our AceMath-7B-Instruct, largely outperforms the previous best-in-class Qwen2.5-Math-7B-Instruct (Average pass@1: 67.2 vs. 62.9) on a variety of math reasoning benchmarks, while coming close to the performance of 10× larger Qwen2.5-Math-72B-Instruct (67.2 vs. 68.2). Notably, our AceMath-72B-Instruct outperforms the state-of-the-art Qwen2.5-Math-72B-Instruct (71.8 vs. 68.2), GPT-4o (67.4) and Claude 3.5 Sonnet (65.6) by a margin. We also report the rm@8 accuracy (best of 8) achieved by our reward model, AceMath-72B-RM, which sets a new record on these reasoning benchmarks. This excludes OpenAI’s o1 model, which relies on scaled inference computation.
## How to use
```python
from datasets import load_dataset
data_files = {
"general_sft_stage1": "data/general_sft_stage1.parquet",
"general_sft_stage2": "data/general_sft_stage2.parquet",
"math_sft": "data/math_sft.parquet",
}
# load the datasets
dataset = load_dataset(
"nvidia/AceMath-Instruct-Training-Data",
data_files=data_files,
cache_dir="CACHE_DIR_TO_STORE_THE_DATASET"
)
# print example
print(dataset['math_sft'][0])
# example format
"""
{
"messages": [
{
"role": "user",
"content": "...",
}
],
"answer": "..."
}
"""
```
## All Resources
### AceMath Instruction Models
- [AceMath-1.5B-Instruct](https://huggingface.co/nvidia/AceMath-1.5B-Instruct), [AceMath-7B-Instruct](https://huggingface.co/nvidia/AceMath-7B-Instruct), [AceMath-72B-Instruct](https://huggingface.co/nvidia/AceMath-72B-Instruct)
### AceMath Reward Models
- [AceMath-7B-RM](https://huggingface.co/nvidia/AceMath-7B-RM), [AceMath-72B-RM](https://huggingface.co/nvidia/AceMath-72B-RM)
### Evaluation & Training Data
- [AceMath-RewardBench](https://huggingface.co/datasets/nvidia/AceMath-RewardBench), [AceMath-Instruct Training Data](https://huggingface.co/datasets/nvidia/AceMath-Instruct-Training-Data), [AceMath-RM Training Data](https://huggingface.co/datasets/nvidia/AceMath-RM-Training-Data)
### General Instruction Models
- [AceInstruct-1.5B](https://huggingface.co/nvidia/AceInstruct-1.5B), [AceInstruct-7B](https://huggingface.co/nvidia/AceInstruct-7B), [AceInstruct-72B](https://huggingface.co/nvidia/AceInstruct-72B)
## Correspondence to
Zihan Liu (zihanl@nvidia.com), Yang Chen (yachen@nvidia.com), Wei Ping (wping@nvidia.com)
## Citation
If you find our work helpful, we’d appreciate it if you could cite us.
<pre>
@article{acemath2024,
title={AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling},
author={Liu, Zihan and Chen, Yang and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
journal={arXiv preprint},
year={2024}
}
</pre>
## License
AceMath-Instruct training dataets are for non-commercial use only, subject to [Terms of Use](https://openai.com/policies/row-terms-of-use/) of the data generated by OpenAI. We put this dataset under the license of [Creative Commons Attribution: Non-Commercial 4.0 International](https://spdx.org/licenses/CC-BY-NC-4.0).
[官方网站](https://research.nvidia.com/labs/adlr/acemath/) | [研究论文](https://arxiv.org/abs/2412.15084)
# AceMath-Instruct 训练数据集卡片
我们发布了用于训练AceMath-1.5B/7B/72B-Instruct大语言模型的全部数据集。这些模型基于通义千问(Qwen)2.5-Math-Base模型,通过多阶段监督微调(SFT,Supervised Fine-Tuning)流程构建。微调首先使用通用监督微调数据(general_sft_stage1.parquet与general_sft_stage2.parquet),随后使用数学专属监督微调数据(math_sft.parquet)。在我们的实验中,仅使用数学专属监督微调数据对Qwen2.5-Math-Base模型进行微调,即可获得具备竞争力的数学推理性能。
AceMath-Instruct训练数据集基于通义千问(Qwen)构建,其中数学提示的输出由Qwen2.5-Math-72B-Instruct生成;其余类型提示的输出则由GPT-4o-mini生成。
以下为各数据集的统计信息:
- general_sft_stage1:2,261,687条样本(包含代码与数学监督微调样本)
- general_sft_stage2:1,634,573条样本(包含代码、数学与通用领域监督微调样本)
- math_sft:1,661,094条样本(仅数学监督微调样本)
## 基准测试结果(AceMath-Instruct + AceMath-72B-RM)
<p align="center">
<img src="./acemath-pic.png" alt="AceMath 基准测试结果" width="800">
</p>
我们在上述表格中将AceMath与主流闭源及开源数学模型进行了对比。我们的AceMath-7B-Instruct在各类数学推理基准测试中,大幅超越此前的同级最优模型Qwen2.5-Math-7B-Instruct(平均pass@1:67.2 vs 62.9),同时性能接近参数量为其10倍的Qwen2.5-Math-72B-Instruct(67.2 vs 68.2)。值得注意的是,我们的AceMath-72B-Instruct以显著优势超越了当前最优模型Qwen2.5-Math-72B-Instruct(71.8 vs 68.2)、GPT-4o(67.4)以及Claude 3.5 Sonnet(65.6)。我们还报告了我们的奖励模型AceMath-72B-RM所达成的rm@8准确率(8次尝试中的最优结果),该结果在这些推理基准测试中创下了新纪录。此统计未纳入依赖大规模推理计算的OpenAI o1模型。
## 使用方法
python
from datasets import load_dataset
data_files = {
"general_sft_stage1": "data/general_sft_stage1.parquet",
"general_sft_stage2": "data/general_sft_stage2.parquet",
"math_sft": "data/math_sft.parquet",
}
# 加载数据集
dataset = load_dataset(
"nvidia/AceMath-Instruct-Training-Data",
data_files=data_files,
cache_dir="用于存储数据集的缓存目录"
)
# 打印示例样本
print(dataset['math_sft'][0])
# 示例数据格式
"""
{
"messages": [
{
"role": "user",
"content": "...",
}
],
"answer": "..."
}
"""
## 全部资源
### AceMath 指令模型
- [AceMath-1.5B-Instruct](https://huggingface.co/nvidia/AceMath-1.5B-Instruct), [AceMath-7B-Instruct](https://huggingface.co/nvidia/AceMath-7B-Instruct), [AceMath-72B-Instruct](https://huggingface.co/nvidia/AceMath-72B-Instruct)
### AceMath 奖励模型
- [AceMath-7B-RM](https://huggingface.co/nvidia/AceMath-7B-RM), [AceMath-72B-RM](https://huggingface.co/nvidia/AceMath-72B-RM)
### 评估与训练数据
- [AceMath-RewardBench](https://huggingface.co/datasets/nvidia/AceMath-RewardBench), [AceMath-Instruct 训练数据集](https://huggingface.co/datasets/nvidia/AceMath-Instruct-Training-Data), [AceMath-RM 训练数据集](https://huggingface.co/datasets/nvidia/AceMath-RM-Training-Data)
### 通用指令模型
- [AceInstruct-1.5B](https://huggingface.co/nvidia/AceInstruct-1.5B), [AceInstruct-7B](https://huggingface.co/nvidia/AceInstruct-7B), [AceInstruct-72B](https://huggingface.co/nvidia/AceInstruct-72B)
## 通讯作者
Zihan Liu (zihanl@nvidia.com), Yang Chen (yachen@nvidia.com), Wei Ping (wping@nvidia.com)
## 引用方式
如果您认为我们的工作对您有所帮助,请引用我们的研究:
<pre>
@article{acemath2024,
title={AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling},
author={Liu, Zihan and Chen, Yang and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
journal={arXiv preprint},
year={2024}
}
</pre>
## 许可协议
AceMath-Instruct训练数据集仅可用于非商业用途,需遵守OpenAI生成数据的[使用条款](https://openai.com/policies/row-terms-of-use/)。本数据集采用[知识共享署名-非商业性使用4.0国际许可协议](https://spdx.org/licenses/CC-BY-NC-4.0)进行授权。
提供机构:
maas
创建时间:
2025-01-23



