introvoyz041/OpenMathInstruct-2
收藏Hugging Face2025-12-12 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/introvoyz041/OpenMathInstruct-2
下载链接
链接失效反馈官方服务:
资源简介:
OpenMathInstruct-2是一个数学指令调优数据集,包含1400万个问题-解决方案对,使用Llama3.1-405B-Instruct模型生成。该数据集通过GSM8K和MATH训练集的问题构建,采用解决方案增强和问题-解决方案增强两种方法。数据集包含以下字段:问题(来自GSM8K或MATH训练集或增强版本的问题)、生成的解决方案(合成生成的解决方案)、预期答案(训练集中问题的真实答案或增强问题中的多数投票答案)、问题来源(问题是否直接来自GSM8K或MATH或增强版本)。此外,还发布了1M、2M和5M的公平下采样版本,便于使用。
OpenMathInstruct-2 is a math instruction tuning dataset with 14M problem-solution pairs generated using the Llama3.1-405B-Instruct model. The dataset is constructed using the training set problems of GSM8K and MATH through solution augmentation (generating chain-of-thought solutions) and problem-solution augmentation (generating new problems followed by solutions). The dataset contains fields such as problem (original or augmented problem), generated_solution (synthetically generated solution), expected_answer (ground-truth answer for training set problems or majority-voting answer for augmented problems), and problem_source (indicating if the problem is from GSM8K, MATH, or an augmented version). The dataset also includes fair-downsampled versions (1M, 2M, and 5M splits) for easier usage.
提供机构:
introvoyz041



