AceMath-RM-Training-Data
收藏魔搭社区2025-11-07 更新2025-01-25 收录
下载链接:
https://modelscope.cn/datasets/nv-community/AceMath-RM-Training-Data
下载链接
链接失效反馈官方服务:
资源简介:
[website](https://research.nvidia.com/labs/adlr/acemath/) | [paper](https://arxiv.org/abs/2412.15084)
## AceMath RM Training Data Card
We release the AceMath RM Training data that is used to train the AceMath-7/72B-RM for math outcome reward modeling. Below is the data statistics:
- number of unique math questions: 356,058
- number of examples: 2,136,348 (each questions have 6 different responses)
## Benchmark Results (AceMath-Instruct + AceMath-72B-RM)
<p align="center">
<img src="./acemath-pic.png" alt="AceMath Benchmark Results" width="800">
</p>
We compare AceMath to leading proprietary and open-access math models in above Table. Our AceMath-7B-Instruct, largely outperforms the previous best-in-class Qwen2.5-Math-7B-Instruct (Average pass@1: 67.2 vs. 62.9) on a variety of math reasoning benchmarks, while coming close to the performance of 10× larger Qwen2.5-Math-72B-Instruct (67.2 vs. 68.2). Notably, our AceMath-72B-Instruct outperforms the state-of-the-art Qwen2.5-Math-72B-Instruct (71.8 vs. 68.2), GPT-4o (67.4) and Claude 3.5 Sonnet (65.6) by a margin. We also report the rm@8 accuracy (best of 8) achieved by our reward model, AceMath-72B-RM, which sets a new record on these reasoning benchmarks. This excludes OpenAI’s o1 model, which relies on scaled inference computation.
## Reward Model Benchmark Results
| Model | GSM8K | MATH500 | Minerva Math | GaoKao 2023 En | Olympiad Bench | College Math | MMLU STEM | Avg. |
|---------------------------|-------|---------|--------------|----------------|-----------------|--------------|-----------|--------|
| majority@8 | 96.22 | 83.11 | 41.20 | 68.21 | 42.69 | 45.01 | 78.21 | 64.95 |
| Skywork-o1-Open-PRM-Qwen-2.5-7B | 96.92 | 86.64 | 41.00 | 72.34 | 46.50 | 46.30 | 74.01 | 66.24 |
| Qwen2.5-Math-RM-72B | 96.61 | 86.63 | 43.60 | 73.62 | 47.21 | 47.29 | 84.24 | 68.46 |
| AceMath-7B-RM (Ours) | 96.66 | 85.47 | 41.96 | 73.82 | 46.81 | 46.37 | 80.78 | 67.41 |
| AceMath-72B-RM (Ours) | 97.23 | 86.72 | 45.06 | 74.69 | 49.23 | 46.79 | 87.01 | 69.53 |
*Reward model evaluation on [AceMath-RewardBench](https://huggingface.co/datasets/nvidia/AceMath-RewardBench). The average results (rm@8) of reward models on math benchmarks, randomly sample 8 responses from 64 candidates with 100 random seeds. Response candidates are generated from a pool of 8 LLMs.
## How to use
```python
from datasets import load_dataset
# Load the dataset from Hugging Face Hub
dataset = load_dataset("nvidia/AceMath-RM-Training-Data")
# Print the first example
print(dataset['train'][0])
"""
{
# Unique key for the question
'qid': '...',
# Conversation between system, user and assistant
'message': [
{
# System prompt setting up the conversation context
'role': 'system',
'content': '...'
},
{
# User's math question (truncated in example)
'role': 'user',
'content': "..."
},
{
# Assistant's step-by-step solution (truncated in example)
'role': 'assistant',
'content': "..."
}
],
# Binary score indicating solution (1 = correct/good, 0 = incorrect/poor)
'label': 1
}
```
## All Resources
### AceMath Instruction Models
- [AceMath-1.5B-Instruct](https://huggingface.co/nvidia/AceMath-1.5B-Instruct), [AceMath-7B-Instruct](https://huggingface.co/nvidia/AceMath-7B-Instruct), [AceMath-72B-Instruct](https://huggingface.co/nvidia/AceMath-72B-Instruct)
### AceMath Reward Models
- [AceMath-7B-RM](https://huggingface.co/nvidia/AceMath-7B-RM), [AceMath-72B-RM](https://huggingface.co/nvidia/AceMath-72B-RM)
### Evaluation & Training Data
- [AceMath-RewardBench](https://huggingface.co/datasets/nvidia/AceMath-RewardBench), [AceMath-Instruct Training Data](https://huggingface.co/datasets/nvidia/AceMath-Instruct-Training-Data), [AceMath-RM Training Data](https://huggingface.co/datasets/nvidia/AceMath-RM-Training-Data)
### General Instruction Models
- [AceInstruct-1.5B](https://huggingface.co/nvidia/AceInstruct-1.5B), [AceInstruct-7B](https://huggingface.co/nvidia/AceInstruct-7B), [AceInstruct-72B](https://huggingface.co/nvidia/AceInstruct-72B)
## Correspondence to
Zihan Liu (zihanl@nvidia.com), Yang Chen (yachen@nvidia.com), Wei Ping (wping@nvidia.com)
## Citation
If you find our work helpful, we’d appreciate it if you could cite us.
<pre>
@article{acemath2024,
title={AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling},
author={Liu, Zihan and Chen, Yang and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
journal={arXiv preprint},
year={2024}
}
</pre>
## License
AceMath-RM training dataets are for non-commercial use only, subject to [Terms of Use](https://openai.com/policies/row-terms-of-use/) of the data generated by OpenAI. We put this dataset under the license of [Creative Commons Attribution: Non-Commercial 4.0 International](https://spdx.org/licenses/CC-BY-NC-4.0).
[官网](https://research.nvidia.com/labs/adlr/acemath/) | [论文](https://arxiv.org/abs/2412.15084)
## AceMath RM 训练数据卡片
我们发布了用于训练AceMath-7/72B-RM(数学结果奖励建模模型)的AceMath RM训练数据集。以下为数据统计信息:
- 唯一数学问题数量:356,058道
- 样本总数:2,136,348个(每个问题对应6种不同解答)
## 基准测试结果(AceMath-Instruct + AceMath-72B-RM)
<p align="center"><img src="./acemath-pic.png" alt="AceMath 基准测试结果" width="800"></p>
我们在上方表格中将AceMath与主流闭源及开源数学模型进行了对比。我们的AceMath-7B-Instruct在各类数学推理基准测试中大幅超越此前的最优模型Qwen2.5-Math-7B-Instruct(平均pass@1:67.2 vs 62.9),同时性能接近参数量为其10倍的Qwen2.5-Math-72B-Instruct(67.2 vs 68.2)。值得注意的是,我们的AceMath-72B-Instruct显著优于当前最优的Qwen2.5-Math-72B-Instruct(71.8 vs 68.2)、GPT-4o(67.4)以及Claude 3.5 Sonnet(65.6)。我们还报告了我们的奖励模型AceMath-72B-RM所达成的rm@8准确率(8次采样最优结果),该结果在上述推理基准测试中创下了新纪录。此对比未纳入依赖规模化推理计算的OpenAI o1模型。
## 奖励模型基准测试结果
| 模型名称 | GSM8K | MATH500 | Minerva 数学基准 | 2023年高考(英文卷) | 奥林匹克竞赛基准 | 大学数学 | MMLU 理工科 | 平均值 |
|---------------------------|-------|---------|--------------|----------------|-----------------|--------------|-----------|--------|
| 多数投票@8 | 96.22 | 83.11 | 41.20 | 68.21 | 42.69 | 45.01 | 78.21 | 64.95 |
| Skywork-o1-Open-PRM-Qwen-2.5-7B | 96.92 | 86.64 | 41.00 | 72.34 | 46.50 | 46.30 | 74.01 | 66.24 |
| Qwen2.5-Math-RM-72B | 96.61 | 86.63 | 43.60 | 73.62 | 47.21 | 47.29 | 84.24 | 68.46 |
| AceMath-7B-RM(我们的模型) | 96.66 | 85.47 | 41.96 | 73.82 | 46.81 | 46.37 | 80.78 | 67.41 |
| AceMath-72B-RM(我们的模型) | 97.23 | 86.72 | 45.06 | 74.69 | 49.23 | 46.79 | 87.01 | 69.53 |
*本奖励模型评估基于[AceMath-RewardBench](https://huggingface.co/datasets/nvidia/AceMath-RewardBench)。奖励模型在数学基准测试上的平均结果(rm@8)为:从64个候选解答中随机采样8次并取最优结果,共执行100次随机种子采样。候选解答由8个大语言模型(LLM)构成的生成池生成。
## 使用方法
python
from datasets import load_dataset
# 从Hugging Face Hub加载数据集
dataset = load_dataset("nvidia/AceMath-RM-Training-Data")
# 打印首个样本
print(dataset['train'][0])
"""
{
# 问题的唯一标识
'qid': '...',
# 系统、用户与助手的对话序列
'message': [
{
# 用于配置对话上下文的系统提示词
'role': 'system',
'content': '...'
},
{
# 用户提出的数学问题(示例中已做截断处理)
'role': 'user',
'content': "..."
},
{
# 助手的分步解答过程(示例中已做截断处理)
'role': 'assistant',
'content': "..."
}
],
# 表示解答质量的二元标签:1代表解答正确/优质,0代表解答错误/劣质
'label': 1
}
"""
## 全部资源
### AceMath 指令模型
- [AceMath-1.5B-Instruct](https://huggingface.co/nvidia/AceMath-1.5B-Instruct), [AceMath-7B-Instruct](https://huggingface.co/nvidia/AceMath-7B-Instruct), [AceMath-72B-Instruct](https://huggingface.co/nvidia/AceMath-72B-Instruct)
### AceMath 奖励模型
- [AceMath-7B-RM](https://huggingface.co/nvidia/AceMath-7B-RM), [AceMath-72B-RM](https://huggingface.co/nvidia/AceMath-72B-RM)
### 评估与训练数据
- [AceMath-RewardBench](https://huggingface.co/datasets/nvidia/AceMath-RewardBench), [AceMath-Instruct 训练数据](https://huggingface.co/datasets/nvidia/AceMath-Instruct-Training-Data), [AceMath-RM 训练数据](https://huggingface.co/datasets/nvidia/AceMath-RM-Training-Data)
### 通用指令模型
- [AceInstruct-1.5B](https://huggingface.co/nvidia/AceInstruct-1.5B), [AceInstruct-7B](https://huggingface.co/nvidia/AceInstruct-7B), [AceInstruct-72B](https://huggingface.co/nvidia/AceInstruct-72B)
## 联系方式
致:刘子涵(zihanl@nvidia.com)、陈阳(yachen@nvidia.com)、平伟(wping@nvidia.com)
## 引用格式
若您的研究得益于本工作,敬请引用:
<pre>
@article{acemath2024,
title={AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling},
author={Liu, Zihan and Chen, Yang and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
journal={arXiv preprint},
year={2024}
}
</pre>
## 许可协议
AceMath-RM训练数据集仅可用于非商业用途,需遵守OpenAI生成数据的[使用条款](https://openai.com/policies/row-terms-of-use/)。本数据集采用[知识共享署名-非商业性使用4.0国际许可协议(CC-BY-NC-4.0)](https://spdx.org/licenses/CC-BY-NC-4.0)进行授权。
提供机构:
maas
创建时间:
2025-01-20



