abacusai/MetaMath_DPO_FewShot
收藏数据集卡片 "MetaMath_DPO_FewShot"
数据集描述
GSM8K citep{cobbe2021training} 是一个包含多样化的中小学数学应用题的数据集,通常被用作衡量大型语言模型(LLMs)数学和推理能力的标准。MetaMath 数据集是GSM8K训练集的扩展,通过数据增强构建。
数据集结构
数据集分为查询和响应,其中查询是一个涉及数学计算或推理的问题,响应是一个逻辑推理和计算的步骤序列,最终得出答案。为了构建配对偏好版本的MetaMath,我们将查询作为提示x,将响应作为首选完成y_w。通过随机修改中间计算结果中的一个使其不正确来创建y_l,但保持答案不变。
示例
x: "What is the total cost of purchasing equipment for all sixteen players on the football team, considering that each player requires a $25 jersey, a $15.20 pair of shorts, and a pair of socks priced at $6.80?"
y_w: "Each player requires a $25 jersey, a $15.20 pair of shorts, and a pair of socks priced at $6.80. So the total cost for each player is $25 + $15.20 + $6.80 = 47. Since there are sixteen players on the football team, the total cost for all of them is 16 * $47 = $752. #### 752 The answer is: 752"
y_l: "Each player requires a $25 jersey, a $15.20 pair of shorts, and a pair of socks priced at $6.80. So the total cost for each player is $25 + $15.20 + $6.80 = 52. Since there are sixteen players on the football team, the total cost for all of them is 16 * $47 = $752. #### 752 The answer is: 752"
数据集目的
构建此数据集的目的是使模型在中间计算中更加精确。该数据集的编辑距离较低,标准化编辑距离约为6.5%。该数据集旨在用于使用DPOP损失函数对已经进行过SFT的LLMs进行微调。我们使用此数据集创建了Smaug系列模型。
数据集规模
数据集包含393,999个训练样本和1,000个评估样本。
更多详情
更多详细信息请参见datasheet和我们的论文:https://arxiv.org/abs/2402.13228。




