AceMath-Instruct-Training-Data

Name: AceMath-Instruct-Training-Data
Creator: maas
Published: 2026-01-02 16:21:15
License: 暂无描述

魔搭社区2026-01-02 更新2025-01-25 收录

下载链接：

https://modelscope.cn/datasets/AI-ModelScope/AceMath-Instruct-Training-Data

下载链接

链接失效反馈

官方服务：

资源简介：

[website](https://research.nvidia.com/labs/adlr/acemath/) | [paper](https://arxiv.org/abs/2412.15084) ## AceMath-Instruct Training Data Card We release all the datasets to train AceMath-1.5B/7B/72B-Instruct models. These models are built upon the Qwen2.5-Math-Base models through a multi-stage supervised fine-tuning (SFT) process. The fine-tuning begins with general-purpose SFT data (general_sft_stage1.parquet and general_sft_stage2.parquet) and is followed by math-specific SFT data (math_sft.parquet). In our experiments, fine-tuning the Qwen2.5-Math-Base models using only the math-specific SFT data also delivers competitive math reasoning performance. AceMath-Instruct training datasets are Built with Qwen with math prompt outputs generated by Qwen2.5-Math-72B-Instruct. Outputs for other types of prompts are generated using GPT-4o-mini. Here are the data statistics: - general_sft_stage1: 2,261,687 samples (consisting of code & math SFT samples) - general_sft_stage2: 1,634,573 samples (consisting of code & math & general-domain SFT samples) - math_sft: 1,661,094 (only math SFT samples) ## Benchmark Results (AceMath-Instruct + AceMath-72B-RM) <img src="./acemath-pic.png" alt="AceMath Benchmark Results" width="800"> We compare AceMath to leading proprietary and open-access math models in above Table. Our AceMath-7B-Instruct, largely outperforms the previous best-in-class Qwen2.5-Math-7B-Instruct (Average pass@1: 67.2 vs. 62.9) on a variety of math reasoning benchmarks, while coming close to the performance of 10× larger Qwen2.5-Math-72B-Instruct (67.2 vs. 68.2). Notably, our AceMath-72B-Instruct outperforms the state-of-the-art Qwen2.5-Math-72B-Instruct (71.8 vs. 68.2), GPT-4o (67.4) and Claude 3.5 Sonnet (65.6) by a margin. We also report the rm@8 accuracy (best of 8) achieved by our reward model, AceMath-72B-RM, which sets a new record on these reasoning benchmarks. This excludes OpenAI’s o1 model, which relies on scaled inference computation. ## How to use ```python from datasets import load_dataset data_files = { "general_sft_stage1": "data/general_sft_stage1.parquet", "general_sft_stage2": "data/general_sft_stage2.parquet", "math_sft": "data/math_sft.parquet", } # load the datasets dataset = load_dataset( "nvidia/AceMath-Instruct-Training-Data", data_files=data_files, cache_dir="CACHE_DIR_TO_STORE_THE_DATASET" ) # print example print(dataset['math_sft'][0]) # example format """ { "messages": [ { "role": "user", "content": "...", } ], "answer": "..." } """ ``` ## All Resources ### AceMath Instruction Models - [AceMath-1.5B-Instruct](https://huggingface.co/nvidia/AceMath-1.5B-Instruct), [AceMath-7B-Instruct](https://huggingface.co/nvidia/AceMath-7B-Instruct), [AceMath-72B-Instruct](https://huggingface.co/nvidia/AceMath-72B-Instruct) ### AceMath Reward Models - [AceMath-7B-RM](https://huggingface.co/nvidia/AceMath-7B-RM), [AceMath-72B-RM](https://huggingface.co/nvidia/AceMath-72B-RM) ### Evaluation & Training Data - [AceMath-RewardBench](https://huggingface.co/datasets/nvidia/AceMath-RewardBench), [AceMath-Instruct Training Data](https://huggingface.co/datasets/nvidia/AceMath-Instruct-Training-Data), [AceMath-RM Training Data](https://huggingface.co/datasets/nvidia/AceMath-RM-Training-Data) ### General Instruction Models - [AceInstruct-1.5B](https://huggingface.co/nvidia/AceInstruct-1.5B), [AceInstruct-7B](https://huggingface.co/nvidia/AceInstruct-7B), [AceInstruct-72B](https://huggingface.co/nvidia/AceInstruct-72B) ## Correspondence to Zihan Liu (zihanl@nvidia.com), Yang Chen (yachen@nvidia.com), Wei Ping (wping@nvidia.com) ## Citation If you find our work helpful, we’d appreciate it if you could cite us. <pre> @article{acemath2024, title={AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling}, author={Liu, Zihan and Chen, Yang and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei}, journal={arXiv preprint}, year={2024} } </pre> ## License AceMath-Instruct training dataets are for non-commercial use only, subject to [Terms of Use](https://openai.com/policies/row-terms-of-use/) of the data generated by OpenAI. We put this dataset under the license of [Creative Commons Attribution: Non-Commercial 4.0 International](https://spdx.org/licenses/CC-BY-NC-4.0).

[官方网站](https://research.nvidia.com/labs/adlr/acemath/) | [研究论文](https://arxiv.org/abs/2412.15084) # AceMath-Instruct 训练数据集卡片我们发布了用于训练AceMath-1.5B/7B/72B-Instruct大语言模型的全部数据集。这些模型基于通义千问（Qwen）2.5-Math-Base模型，通过多阶段监督微调（SFT，Supervised Fine-Tuning）流程构建。微调首先使用通用监督微调数据（general_sft_stage1.parquet与general_sft_stage2.parquet），随后使用数学专属监督微调数据（math_sft.parquet）。在我们的实验中，仅使用数学专属监督微调数据对Qwen2.5-Math-Base模型进行微调，即可获得具备竞争力的数学推理性能。 AceMath-Instruct训练数据集基于通义千问（Qwen）构建，其中数学提示的输出由Qwen2.5-Math-72B-Instruct生成；其余类型提示的输出则由GPT-4o-mini生成。以下为各数据集的统计信息： - general_sft_stage1：2,261,687条样本（包含代码与数学监督微调样本） - general_sft_stage2：1,634,573条样本（包含代码、数学与通用领域监督微调样本） - math_sft：1,661,094条样本（仅数学监督微调样本） ## 基准测试结果（AceMath-Instruct + AceMath-72B-RM） <img src="./acemath-pic.png" alt="AceMath 基准测试结果" width="800"> 我们在上述表格中将AceMath与主流闭源及开源数学模型进行了对比。我们的AceMath-7B-Instruct在各类数学推理基准测试中，大幅超越此前的同级最优模型Qwen2.5-Math-7B-Instruct（平均pass@1：67.2 vs 62.9），同时性能接近参数量为其10倍的Qwen2.5-Math-72B-Instruct（67.2 vs 68.2）。值得注意的是，我们的AceMath-72B-Instruct以显著优势超越了当前最优模型Qwen2.5-Math-72B-Instruct（71.8 vs 68.2）、GPT-4o（67.4）以及Claude 3.5 Sonnet（65.6）。我们还报告了我们的奖励模型AceMath-72B-RM所达成的rm@8准确率（8次尝试中的最优结果），该结果在这些推理基准测试中创下了新纪录。此统计未纳入依赖大规模推理计算的OpenAI o1模型。 ## 使用方法 python from datasets import load_dataset data_files = { "general_sft_stage1": "data/general_sft_stage1.parquet", "general_sft_stage2": "data/general_sft_stage2.parquet", "math_sft": "data/math_sft.parquet", } # 加载数据集 dataset = load_dataset( "nvidia/AceMath-Instruct-Training-Data", data_files=data_files, cache_dir="用于存储数据集的缓存目录" ) # 打印示例样本 print(dataset['math_sft'][0]) # 示例数据格式 """ { "messages": [ { "role": "user", "content": "...", } ], "answer": "..." } """ ## 全部资源 ### AceMath 指令模型 - [AceMath-1.5B-Instruct](https://huggingface.co/nvidia/AceMath-1.5B-Instruct), [AceMath-7B-Instruct](https://huggingface.co/nvidia/AceMath-7B-Instruct), [AceMath-72B-Instruct](https://huggingface.co/nvidia/AceMath-72B-Instruct) ### AceMath 奖励模型 - [AceMath-7B-RM](https://huggingface.co/nvidia/AceMath-7B-RM), [AceMath-72B-RM](https://huggingface.co/nvidia/AceMath-72B-RM) ### 评估与训练数据 - [AceMath-RewardBench](https://huggingface.co/datasets/nvidia/AceMath-RewardBench), [AceMath-Instruct 训练数据集](https://huggingface.co/datasets/nvidia/AceMath-Instruct-Training-Data), [AceMath-RM 训练数据集](https://huggingface.co/datasets/nvidia/AceMath-RM-Training-Data) ### 通用指令模型 - [AceInstruct-1.5B](https://huggingface.co/nvidia/AceInstruct-1.5B), [AceInstruct-7B](https://huggingface.co/nvidia/AceInstruct-7B), [AceInstruct-72B](https://huggingface.co/nvidia/AceInstruct-72B) ## 通讯作者 Zihan Liu (zihanl@nvidia.com), Yang Chen (yachen@nvidia.com), Wei Ping (wping@nvidia.com) ## 引用方式如果您认为我们的工作对您有所帮助，请引用我们的研究： <pre> @article{acemath2024, title={AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling}, author={Liu, Zihan and Chen, Yang and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei}, journal={arXiv preprint}, year={2024} } </pre> ## 许可协议 AceMath-Instruct训练数据集仅可用于非商业用途，需遵守OpenAI生成数据的[使用条款](https://openai.com/policies/row-terms-of-use/)。本数据集采用[知识共享署名-非商业性使用4.0国际许可协议](https://spdx.org/licenses/CC-BY-NC-4.0)进行授权。

提供机构：

maas

创建时间：

2025-01-23

5,000+

优质数据集

54 个

任务类型

进入经典数据集