five

AceMath-Instruct-Training-Data

收藏
魔搭社区2026-01-02 更新2025-01-25 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/AceMath-Instruct-Training-Data
下载链接
链接失效反馈
官方服务:
资源简介:
[website](https://research.nvidia.com/labs/adlr/acemath/) | [paper](https://arxiv.org/abs/2412.15084) ## AceMath-Instruct Training Data Card We release all the datasets to train AceMath-1.5B/7B/72B-Instruct models. These models are built upon the Qwen2.5-Math-Base models through a multi-stage supervised fine-tuning (SFT) process. The fine-tuning begins with general-purpose SFT data (general_sft_stage1.parquet and general_sft_stage2.parquet) and is followed by math-specific SFT data (math_sft.parquet). In our experiments, fine-tuning the Qwen2.5-Math-Base models using only the math-specific SFT data also delivers competitive math reasoning performance. AceMath-Instruct training datasets are <b>Built with Qwen</b> with math prompt outputs generated by Qwen2.5-Math-72B-Instruct. Outputs for other types of prompts are generated using GPT-4o-mini. Here are the data statistics: - general_sft_stage1: 2,261,687 samples (consisting of code & math SFT samples) - general_sft_stage2: 1,634,573 samples (consisting of code & math & general-domain SFT samples) - math_sft: 1,661,094 (only math SFT samples) ## Benchmark Results (AceMath-Instruct + AceMath-72B-RM) <p align="center"> <img src="./acemath-pic.png" alt="AceMath Benchmark Results" width="800"> </p> We compare AceMath to leading proprietary and open-access math models in above Table. Our AceMath-7B-Instruct, largely outperforms the previous best-in-class Qwen2.5-Math-7B-Instruct (Average pass@1: 67.2 vs. 62.9) on a variety of math reasoning benchmarks, while coming close to the performance of 10× larger Qwen2.5-Math-72B-Instruct (67.2 vs. 68.2). Notably, our AceMath-72B-Instruct outperforms the state-of-the-art Qwen2.5-Math-72B-Instruct (71.8 vs. 68.2), GPT-4o (67.4) and Claude 3.5 Sonnet (65.6) by a margin. We also report the rm@8 accuracy (best of 8) achieved by our reward model, AceMath-72B-RM, which sets a new record on these reasoning benchmarks. This excludes OpenAI’s o1 model, which relies on scaled inference computation. ## How to use ```python from datasets import load_dataset data_files = { "general_sft_stage1": "data/general_sft_stage1.parquet", "general_sft_stage2": "data/general_sft_stage2.parquet", "math_sft": "data/math_sft.parquet", } # load the datasets dataset = load_dataset( "nvidia/AceMath-Instruct-Training-Data", data_files=data_files, cache_dir="CACHE_DIR_TO_STORE_THE_DATASET" ) # print example print(dataset['math_sft'][0]) # example format """ { "messages": [ { "role": "user", "content": "...", } ], "answer": "..." } """ ``` ## All Resources ### AceMath Instruction Models - [AceMath-1.5B-Instruct](https://huggingface.co/nvidia/AceMath-1.5B-Instruct), [AceMath-7B-Instruct](https://huggingface.co/nvidia/AceMath-7B-Instruct), [AceMath-72B-Instruct](https://huggingface.co/nvidia/AceMath-72B-Instruct) ### AceMath Reward Models - [AceMath-7B-RM](https://huggingface.co/nvidia/AceMath-7B-RM), [AceMath-72B-RM](https://huggingface.co/nvidia/AceMath-72B-RM) ### Evaluation & Training Data - [AceMath-RewardBench](https://huggingface.co/datasets/nvidia/AceMath-RewardBench), [AceMath-Instruct Training Data](https://huggingface.co/datasets/nvidia/AceMath-Instruct-Training-Data), [AceMath-RM Training Data](https://huggingface.co/datasets/nvidia/AceMath-RM-Training-Data) ### General Instruction Models - [AceInstruct-1.5B](https://huggingface.co/nvidia/AceInstruct-1.5B), [AceInstruct-7B](https://huggingface.co/nvidia/AceInstruct-7B), [AceInstruct-72B](https://huggingface.co/nvidia/AceInstruct-72B) ## Correspondence to Zihan Liu (zihanl@nvidia.com), Yang Chen (yachen@nvidia.com), Wei Ping (wping@nvidia.com) ## Citation If you find our work helpful, we’d appreciate it if you could cite us. <pre> @article{acemath2024, title={AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling}, author={Liu, Zihan and Chen, Yang and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei}, journal={arXiv preprint}, year={2024} } </pre> ## License AceMath-Instruct training dataets are for non-commercial use only, subject to [Terms of Use](https://openai.com/policies/row-terms-of-use/) of the data generated by OpenAI. We put this dataset under the license of [Creative Commons Attribution: Non-Commercial 4.0 International](https://spdx.org/licenses/CC-BY-NC-4.0).

[官方网站](https://research.nvidia.com/labs/adlr/acemath/) | [研究论文](https://arxiv.org/abs/2412.15084) # AceMath-Instruct 训练数据集卡片 我们发布了用于训练AceMath-1.5B/7B/72B-Instruct大语言模型的全部数据集。这些模型基于通义千问(Qwen)2.5-Math-Base模型,通过多阶段监督微调(SFT,Supervised Fine-Tuning)流程构建。微调首先使用通用监督微调数据(general_sft_stage1.parquet与general_sft_stage2.parquet),随后使用数学专属监督微调数据(math_sft.parquet)。在我们的实验中,仅使用数学专属监督微调数据对Qwen2.5-Math-Base模型进行微调,即可获得具备竞争力的数学推理性能。 AceMath-Instruct训练数据集基于通义千问(Qwen)构建,其中数学提示的输出由Qwen2.5-Math-72B-Instruct生成;其余类型提示的输出则由GPT-4o-mini生成。 以下为各数据集的统计信息: - general_sft_stage1:2,261,687条样本(包含代码与数学监督微调样本) - general_sft_stage2:1,634,573条样本(包含代码、数学与通用领域监督微调样本) - math_sft:1,661,094条样本(仅数学监督微调样本) ## 基准测试结果(AceMath-Instruct + AceMath-72B-RM) <p align="center"> <img src="./acemath-pic.png" alt="AceMath 基准测试结果" width="800"> </p> 我们在上述表格中将AceMath与主流闭源及开源数学模型进行了对比。我们的AceMath-7B-Instruct在各类数学推理基准测试中,大幅超越此前的同级最优模型Qwen2.5-Math-7B-Instruct(平均pass@1:67.2 vs 62.9),同时性能接近参数量为其10倍的Qwen2.5-Math-72B-Instruct(67.2 vs 68.2)。值得注意的是,我们的AceMath-72B-Instruct以显著优势超越了当前最优模型Qwen2.5-Math-72B-Instruct(71.8 vs 68.2)、GPT-4o(67.4)以及Claude 3.5 Sonnet(65.6)。我们还报告了我们的奖励模型AceMath-72B-RM所达成的rm@8准确率(8次尝试中的最优结果),该结果在这些推理基准测试中创下了新纪录。此统计未纳入依赖大规模推理计算的OpenAI o1模型。 ## 使用方法 python from datasets import load_dataset data_files = { "general_sft_stage1": "data/general_sft_stage1.parquet", "general_sft_stage2": "data/general_sft_stage2.parquet", "math_sft": "data/math_sft.parquet", } # 加载数据集 dataset = load_dataset( "nvidia/AceMath-Instruct-Training-Data", data_files=data_files, cache_dir="用于存储数据集的缓存目录" ) # 打印示例样本 print(dataset['math_sft'][0]) # 示例数据格式 """ { "messages": [ { "role": "user", "content": "...", } ], "answer": "..." } """ ## 全部资源 ### AceMath 指令模型 - [AceMath-1.5B-Instruct](https://huggingface.co/nvidia/AceMath-1.5B-Instruct), [AceMath-7B-Instruct](https://huggingface.co/nvidia/AceMath-7B-Instruct), [AceMath-72B-Instruct](https://huggingface.co/nvidia/AceMath-72B-Instruct) ### AceMath 奖励模型 - [AceMath-7B-RM](https://huggingface.co/nvidia/AceMath-7B-RM), [AceMath-72B-RM](https://huggingface.co/nvidia/AceMath-72B-RM) ### 评估与训练数据 - [AceMath-RewardBench](https://huggingface.co/datasets/nvidia/AceMath-RewardBench), [AceMath-Instruct 训练数据集](https://huggingface.co/datasets/nvidia/AceMath-Instruct-Training-Data), [AceMath-RM 训练数据集](https://huggingface.co/datasets/nvidia/AceMath-RM-Training-Data) ### 通用指令模型 - [AceInstruct-1.5B](https://huggingface.co/nvidia/AceInstruct-1.5B), [AceInstruct-7B](https://huggingface.co/nvidia/AceInstruct-7B), [AceInstruct-72B](https://huggingface.co/nvidia/AceInstruct-72B) ## 通讯作者 Zihan Liu (zihanl@nvidia.com), Yang Chen (yachen@nvidia.com), Wei Ping (wping@nvidia.com) ## 引用方式 如果您认为我们的工作对您有所帮助,请引用我们的研究: <pre> @article{acemath2024, title={AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling}, author={Liu, Zihan and Chen, Yang and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei}, journal={arXiv preprint}, year={2024} } </pre> ## 许可协议 AceMath-Instruct训练数据集仅可用于非商业用途,需遵守OpenAI生成数据的[使用条款](https://openai.com/policies/row-terms-of-use/)。本数据集采用[知识共享署名-非商业性使用4.0国际许可协议](https://spdx.org/licenses/CC-BY-NC-4.0)进行授权。
提供机构:
maas
创建时间:
2025-01-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作