five

InfinityMATH

收藏
魔搭社区2026-05-16 更新2024-09-14 收录
下载链接:
https://modelscope.cn/datasets/BAAI/InfinityMATH
下载链接
链接失效反馈
官方服务:
资源简介:
## InfinityMATH We introduce InfinityMATH, a scalable instruction tuning dataset for programmatic mathematical reasoning. The construction pipeline emphasizes decoupling numbers from mathematical problems to synthesize number-independent programs, enabling efficient and flexible scaling while minimizing dependency on specific numerical values. Fine-tuning experiments with open-source language and code models, such as Llama2 and CodeLlama, demonstrate the practical benefits of InfinityMATH. These fine-tuned models, showed significant relative improvements on both in-domain and out-of-domain benchmarks, ranging from 184.7% to 514.3% on average. Additionally, these models exhibited high robustness on the [GSM8K+](https://huggingface.co/datasets/flagopen/gsm8kplus) and [MATH+](https://huggingface.co/datasets/flagopen/MATHplus) benchmarks, which are enhanced version of test sets with simply the number variations. InfinityMATH ensures that models are more versatile and effective across a broader range of mathematical problems. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6335113375bed9932474315e/2ogh84TPxP5w3dLltPLnG.png) ## News - 🔥🔥🔥[2024/07/16] Our paper "InfinityMath: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning" has been accepted at the CIKM 2024. ## Results We perform 0-shot evaluation on 5 in-domain benchmarks (GSM8k, MATH, AQuA, Deepmind Mathematics, NumGLUE) and 4 out-of-domain benchmarks (SVAMP, SimulEq, SAT-Math, MMLU-Math), compare base models (Llama2-7B, Aquila2-7B, Mistral-7B, Gemma2-2B, CodeLlama-7B) and their fine-tuned models with WizardMath, MetaMath, MathInstruct, MathInstruct(PoT), InfinityMATH, etc. The results are as follows: ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6335113375bed9932474315e/LXOcpxq2KjcQPAD-jkLq1.png) The results on GSM8k+ and MATH+ are as follows: ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6335113375bed9932474315e/b1SMjeibgQUo7gZk3aP_m.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6335113375bed9932474315e/BRs8Wt-NTgJshmsSE34V5.png) - **Partial Accuracy (Acc_partial)** - The accuracy of correctly answering at least one sub-question out of three in each item. - **Full Accuracy (Acc_full)** - The accuracy of correctly answering all three sub-questions in each item. - **Consistency Ratio (CR)** - The ratio of full accuracy to partial accuracy. ## Disclaimer The resources, including code, data, and model weights, associated with this project are restricted for academic research purposes only and cannot be used for commercial purposes. The content produced by any version of Infinity-Preference is influenced by uncontrollable variables such as randomness, and therefore, the accuracy of the output cannot be guaranteed by this project. This project does not accept any legal liability for the content of the model output, nor does it assume responsibility for any losses incurred due to the use of associated resources and output results. ## Citation If you find this repository useful, please consider giving a star :star: and citation ``` @misc{zhang2024inifinitymath, title={InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning}, author={Bo-Wen Zhang and Yan Yan and Lin Li and Guang Liu}, year={2024}, eprint={2408.07089}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2408.07089}, }

## InfinityMATH 我们提出了InfinityMATH,一款面向程序化数学推理的可扩展指令微调数据集。其构建流程核心在于将数字与数学问题解耦,以合成与数字无关的程序,从而实现高效灵活的扩展,并最大限度降低对特定数值的依赖。针对开源语言模型与代码模型(如Llama2、CodeLlama)开展的微调实验,验证了InfinityMATH的实际应用价值。经其微调后的模型在域内与域外基准测试中均实现了显著的相对性能提升,平均提升幅度介于184.7%至514.3%之间。此外,这些模型在[GSM8K+](https://huggingface.co/datasets/flagopen/gsm8kplus)与[MATH+](https://huggingface.co/datasets/flagopen/MATHplus)基准测试中展现出优异的鲁棒性——这两类基准测试均为仅通过数值变体进行增强的测试集。InfinityMATH可使模型在更广泛的数学问题场景中具备更强的通用性与有效性。 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6335113375bed9932474315e/2ogh84TPxP5w3dLltPLnG.png) ## 最新动态 - 🔥🔥🔥[2024/07/16] 我们的论文《InfinityMath: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning》已被CIKM 2024会议收录。 ## 实验结果 我们在5个域内基准测试(GSM8k、MATH、AQuA、Deepmind Mathematics、NumGLUE)与4个域外基准测试(SVAMP、SimulEq、SAT-Math、MMLU-Math)上开展了零样本(Zero-shot)评估,对比了基础模型(Llama2-7B、Aquila2-7B、Mistral-7B、Gemma2-2B、CodeLlama-7B)以及使用WizardMath、MetaMath、MathInstruct、MathInstruct(PoT)、InfinityMATH等方法微调后的模型。实验结果如下: ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6335113375bed9932474315e/LXOcpxq2KjcQPAD-jkLq1.png) GSM8K+与MATH+基准测试的实验结果如下: ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6335113375bed9932474315e/b1SMjeibgQUo7gZk3aP_m.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6335113375bed9932474315e/BRs8Wt-NTgJshmsSE34V5.png) - **局部准确率(Partial Accuracy, Acc_partial)** - 指在每道试题的三个子问题中正确答对至少一个子问题的准确率。 - **全局准确率(Full Accuracy, Acc_full)** - 指在每道试题的三个子问题中全部答对的准确率。 - **一致性比率(Consistency Ratio, CR)** - 指全局准确率与局部准确率的比值。 ## 免责声明 本项目关联的代码、数据与模型权重等资源仅可用于学术研究用途,严禁用于商业场景。任何版本的Infinity-Preference生成的内容均受随机性等不可控变量影响,本项目无法保证模型输出的准确性。本项目不对模型输出内容承担任何法律责任,亦不对因使用本项目关联资源与模型输出结果所产生的任何损失承担责任。 ## 引用 若您认为本仓库对您的研究有所帮助,请为其点亮星标⭐并引用如下文献: @misc{zhang2024inifinitymath, title={InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning}, author={Bo-Wen Zhang and Yan Yan and Lin Li and Guang Liu}, year={2024}, eprint={2408.07089}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2408.07089}, }
提供机构:
maas
创建时间:
2024-09-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作