AceMath-RM-Training-Data

Name: AceMath-RM-Training-Data
Creator: maas
Published: 2025-11-07 16:21:13
License: 暂无描述

魔搭社区2025-11-07 更新2025-01-25 收录

下载链接：

https://modelscope.cn/datasets/nv-community/AceMath-RM-Training-Data

下载链接

链接失效反馈

官方服务：

资源简介：

[website](https://research.nvidia.com/labs/adlr/acemath/) | [paper](https://arxiv.org/abs/2412.15084) ## AceMath RM Training Data Card We release the AceMath RM Training data that is used to train the AceMath-7/72B-RM for math outcome reward modeling. Below is the data statistics: - number of unique math questions: 356,058 - number of examples: 2,136,348 (each questions have 6 different responses) ## Benchmark Results (AceMath-Instruct + AceMath-72B-RM) <p align="center"> <img src="./acemath-pic.png" alt="AceMath Benchmark Results" width="800"> </p> We compare AceMath to leading proprietary and open-access math models in above Table. Our AceMath-7B-Instruct, largely outperforms the previous best-in-class Qwen2.5-Math-7B-Instruct (Average pass@1: 67.2 vs. 62.9) on a variety of math reasoning benchmarks, while coming close to the performance of 10× larger Qwen2.5-Math-72B-Instruct (67.2 vs. 68.2). Notably, our AceMath-72B-Instruct outperforms the state-of-the-art Qwen2.5-Math-72B-Instruct (71.8 vs. 68.2), GPT-4o (67.4) and Claude 3.5 Sonnet (65.6) by a margin. We also report the rm@8 accuracy (best of 8) achieved by our reward model, AceMath-72B-RM, which sets a new record on these reasoning benchmarks. This excludes OpenAI’s o1 model, which relies on scaled inference computation. ## Reward Model Benchmark Results | Model | GSM8K | MATH500 | Minerva Math | GaoKao 2023 En | Olympiad Bench | College Math | MMLU STEM | Avg. | |---------------------------|-------|---------|--------------|----------------|-----------------|--------------|-----------|--------| | majority@8 | 96.22 | 83.11 | 41.20 | 68.21 | 42.69 | 45.01 | 78.21 | 64.95 | | Skywork-o1-Open-PRM-Qwen-2.5-7B | 96.92 | 86.64 | 41.00 | 72.34 | 46.50 | 46.30 | 74.01 | 66.24 | | Qwen2.5-Math-RM-72B | 96.61 | 86.63 | 43.60 | 73.62 | 47.21 | 47.29 | 84.24 | 68.46 | | AceMath-7B-RM (Ours) | 96.66 | 85.47 | 41.96 | 73.82 | 46.81 | 46.37 | 80.78 | 67.41 | | AceMath-72B-RM (Ours) | 97.23 | 86.72 | 45.06 | 74.69 | 49.23 | 46.79 | 87.01 | 69.53 | *Reward model evaluation on [AceMath-RewardBench](https://huggingface.co/datasets/nvidia/AceMath-RewardBench). The average results (rm@8) of reward models on math benchmarks, randomly sample 8 responses from 64 candidates with 100 random seeds. Response candidates are generated from a pool of 8 LLMs. ## How to use ```python from datasets import load_dataset # Load the dataset from Hugging Face Hub dataset = load_dataset("nvidia/AceMath-RM-Training-Data") # Print the first example print(dataset['train'][0]) """ { # Unique key for the question 'qid': '...', # Conversation between system, user and assistant 'message': [ { # System prompt setting up the conversation context 'role': 'system', 'content': '...' }, { # User's math question (truncated in example) 'role': 'user', 'content': "..." }, { # Assistant's step-by-step solution (truncated in example) 'role': 'assistant', 'content': "..." } ], # Binary score indicating solution (1 = correct/good, 0 = incorrect/poor) 'label': 1 } ``` ## All Resources ### AceMath Instruction Models - [AceMath-1.5B-Instruct](https://huggingface.co/nvidia/AceMath-1.5B-Instruct), [AceMath-7B-Instruct](https://huggingface.co/nvidia/AceMath-7B-Instruct), [AceMath-72B-Instruct](https://huggingface.co/nvidia/AceMath-72B-Instruct) ### AceMath Reward Models - [AceMath-7B-RM](https://huggingface.co/nvidia/AceMath-7B-RM), [AceMath-72B-RM](https://huggingface.co/nvidia/AceMath-72B-RM) ### Evaluation & Training Data - [AceMath-RewardBench](https://huggingface.co/datasets/nvidia/AceMath-RewardBench), [AceMath-Instruct Training Data](https://huggingface.co/datasets/nvidia/AceMath-Instruct-Training-Data), [AceMath-RM Training Data](https://huggingface.co/datasets/nvidia/AceMath-RM-Training-Data) ### General Instruction Models - [AceInstruct-1.5B](https://huggingface.co/nvidia/AceInstruct-1.5B), [AceInstruct-7B](https://huggingface.co/nvidia/AceInstruct-7B), [AceInstruct-72B](https://huggingface.co/nvidia/AceInstruct-72B) ## Correspondence to Zihan Liu (zihanl@nvidia.com), Yang Chen (yachen@nvidia.com), Wei Ping (wping@nvidia.com) ## Citation If you find our work helpful, we’d appreciate it if you could cite us. <pre> @article{acemath2024, title={AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling}, author={Liu, Zihan and Chen, Yang and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei}, journal={arXiv preprint}, year={2024} } </pre> ## License AceMath-RM training dataets are for non-commercial use only, subject to [Terms of Use](https://openai.com/policies/row-terms-of-use/) of the data generated by OpenAI. We put this dataset under the license of [Creative Commons Attribution: Non-Commercial 4.0 International](https://spdx.org/licenses/CC-BY-NC-4.0).

[官网](https://research.nvidia.com/labs/adlr/acemath/) | [论文](https://arxiv.org/abs/2412.15084) ## AceMath RM 训练数据卡片我们发布了用于训练AceMath-7/72B-RM（数学结果奖励建模模型）的AceMath RM训练数据集。以下为数据统计信息： - 唯一数学问题数量：356,058道 - 样本总数：2,136,348个（每个问题对应6种不同解答） ## 基准测试结果（AceMath-Instruct + AceMath-72B-RM） <p align="center"><img src="./acemath-pic.png" alt="AceMath 基准测试结果" width="800"></p> 我们在上方表格中将AceMath与主流闭源及开源数学模型进行了对比。我们的AceMath-7B-Instruct在各类数学推理基准测试中大幅超越此前的最优模型Qwen2.5-Math-7B-Instruct（平均pass@1：67.2 vs 62.9），同时性能接近参数量为其10倍的Qwen2.5-Math-72B-Instruct（67.2 vs 68.2）。值得注意的是，我们的AceMath-72B-Instruct显著优于当前最优的Qwen2.5-Math-72B-Instruct（71.8 vs 68.2）、GPT-4o（67.4）以及Claude 3.5 Sonnet（65.6）。我们还报告了我们的奖励模型AceMath-72B-RM所达成的rm@8准确率（8次采样最优结果），该结果在上述推理基准测试中创下了新纪录。此对比未纳入依赖规模化推理计算的OpenAI o1模型。 ## 奖励模型基准测试结果 | 模型名称 | GSM8K | MATH500 | Minerva 数学基准 | 2023年高考（英文卷） | 奥林匹克竞赛基准 | 大学数学 | MMLU 理工科 | 平均值 | |---------------------------|-------|---------|--------------|----------------|-----------------|--------------|-----------|--------| | 多数投票@8 | 96.22 | 83.11 | 41.20 | 68.21 | 42.69 | 45.01 | 78.21 | 64.95 | | Skywork-o1-Open-PRM-Qwen-2.5-7B | 96.92 | 86.64 | 41.00 | 72.34 | 46.50 | 46.30 | 74.01 | 66.24 | | Qwen2.5-Math-RM-72B | 96.61 | 86.63 | 43.60 | 73.62 | 47.21 | 47.29 | 84.24 | 68.46 | | AceMath-7B-RM（我们的模型） | 96.66 | 85.47 | 41.96 | 73.82 | 46.81 | 46.37 | 80.78 | 67.41 | | AceMath-72B-RM（我们的模型） | 97.23 | 86.72 | 45.06 | 74.69 | 49.23 | 46.79 | 87.01 | 69.53 | *本奖励模型评估基于[AceMath-RewardBench](https://huggingface.co/datasets/nvidia/AceMath-RewardBench)。奖励模型在数学基准测试上的平均结果（rm@8）为：从64个候选解答中随机采样8次并取最优结果，共执行100次随机种子采样。候选解答由8个大语言模型（LLM）构成的生成池生成。 ## 使用方法 python from datasets import load_dataset # 从Hugging Face Hub加载数据集 dataset = load_dataset("nvidia/AceMath-RM-Training-Data") # 打印首个样本 print(dataset['train'][0]) """ { # 问题的唯一标识 'qid': '...', # 系统、用户与助手的对话序列 'message': [ { # 用于配置对话上下文的系统提示词 'role': 'system', 'content': '...' }, { # 用户提出的数学问题（示例中已做截断处理） 'role': 'user', 'content': "..." }, { # 助手的分步解答过程（示例中已做截断处理） 'role': 'assistant', 'content': "..." } ], # 表示解答质量的二元标签：1代表解答正确/优质，0代表解答错误/劣质 'label': 1 } """ ## 全部资源 ### AceMath 指令模型 - [AceMath-1.5B-Instruct](https://huggingface.co/nvidia/AceMath-1.5B-Instruct), [AceMath-7B-Instruct](https://huggingface.co/nvidia/AceMath-7B-Instruct), [AceMath-72B-Instruct](https://huggingface.co/nvidia/AceMath-72B-Instruct) ### AceMath 奖励模型 - [AceMath-7B-RM](https://huggingface.co/nvidia/AceMath-7B-RM), [AceMath-72B-RM](https://huggingface.co/nvidia/AceMath-72B-RM) ### 评估与训练数据 - [AceMath-RewardBench](https://huggingface.co/datasets/nvidia/AceMath-RewardBench), [AceMath-Instruct 训练数据](https://huggingface.co/datasets/nvidia/AceMath-Instruct-Training-Data), [AceMath-RM 训练数据](https://huggingface.co/datasets/nvidia/AceMath-RM-Training-Data) ### 通用指令模型 - [AceInstruct-1.5B](https://huggingface.co/nvidia/AceInstruct-1.5B), [AceInstruct-7B](https://huggingface.co/nvidia/AceInstruct-7B), [AceInstruct-72B](https://huggingface.co/nvidia/AceInstruct-72B) ## 联系方式致：刘子涵（zihanl@nvidia.com）、陈阳（yachen@nvidia.com）、平伟（wping@nvidia.com） ## 引用格式若您的研究得益于本工作，敬请引用： <pre> @article{acemath2024, title={AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling}, author={Liu, Zihan and Chen, Yang and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei}, journal={arXiv preprint}, year={2024} } </pre> ## 许可协议 AceMath-RM训练数据集仅可用于非商业用途，需遵守OpenAI生成数据的[使用条款](https://openai.com/policies/row-terms-of-use/)。本数据集采用[知识共享署名-非商业性使用4.0国际许可协议（CC-BY-NC-4.0）](https://spdx.org/licenses/CC-BY-NC-4.0)进行授权。

提供机构：

maas

创建时间：

2025-01-20

5,000+

优质数据集

54 个

任务类型

进入经典数据集