OpenMathInstruct-2 数学指令调优数据集
收藏超神经2024-10-27 更新2024-12-14 收录
下载链接:
https://hyper.ai/cn/datasets/35154
下载链接
链接失效反馈官方服务:
资源简介:
OpenMathInstruct-2 是由英伟达于 2024 年发布的大规模的开源数学指导数据集,旨在加速人工智能在数学领域的进展,相关论文成果为「OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data」。该数据集包含 1,400 万对问答(约有 60 万个独特的问题),其规模是此前最大同类数据集的近 8 倍。通过使用 OpenMathInstruct-2 对 Llama-3.1-8B-Base 模型进行微调,其在 MATH 数据集上的性能比 Llama3.1-8B-Instruct 提高了 15.9%(从 51.9% 提高到 67.8%)。
OpenMathInstruct-2 is a large-scale open-source math instruction dataset released by NVIDIA in 2024, designed to accelerate the advancement of artificial intelligence in the field of mathematics. Its accompanying academic paper is titled "OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data". This dataset contains 14 million question-answer pairs, with approximately 600,000 unique questions, and its scale is nearly 8 times that of the largest prior comparable dataset of this type. By fine-tuning the Llama-3.1-8B-Base model using OpenMathInstruct-2, the model's performance on the MATH dataset improves by 15.9% compared to Llama3.1-8B-Instruct, rising from 51.9% to 67.8%.
创建时间:
2024-10-22
搜集汇总
数据集介绍

背景与挑战
背景概述
OpenMathInstruct-2是英伟达2024年发布的大规模开源数学指令调优数据集,包含1400万对问答(约60万个独特问题),旨在加速人工智能在数学推理领域的进展。通过该数据集对模型进行微调,可显著提升数学问题解决性能,例如使Llama-3.1-8B-Base在MATH数据集上的表现提高15.9%。
以上内容由遇见数据集搜集并总结生成



