abacusai/MetaMath_DPO_FewShot

Name: abacusai/MetaMath_DPO_FewShot
Creator: abacusai
Published: 2024-02-26 16:00:50
License: 暂无描述

Hugging Face2024-02-26 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/abacusai/MetaMath_DPO_FewShot

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 dataset_info: features: - name: prompt dtype: string - name: chosen dtype: string - name: rejected dtype: string splits: - name: train num_bytes: 1211199708 num_examples: 393999 - name: eval num_bytes: 3029624 num_examples: 1000 download_size: 561354683 dataset_size: 1214229332 configs: - config_name: default data_files: - split: train path: data/train-* - split: eval path: data/eval-* --- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c14f6b02e1f8f67c73bd05/_Z4fNfPl_Ix_gGT5Yoi0J.png) # Dataset Card for "MetaMath_DPO_FewShot" GSM8K \citep{cobbe2021training} is a dataset of diverse grade school maths word problems, which has been commonly adopted as a measure of the math and reasoning skills of LLMs. The [MetaMath](https://meta-math.github.io/) dataset is an extension of the training set of GSM8K using data augmentation. It is partitioned into queries and responses, where the query is a question involving mathematical calculation or reasoning, and the response is a logical series of steps and calculations that culminate in a final answer. To construct our paired-preference version of MetaMath, we take the queries as prompts x and the responses as the preferred completions y_w. We create y_l by modifying the response by randomly corrupting one of the results of an intermediate calculation so that it is incorrect; however, we leave the answer untouched. An example is as follows: x: "What is the total cost of purchasing equipment for all sixteen players on the football team, considering that each player requires a $25 jersey, a $15.20 pair of shorts, and a pair of socks priced at $6.80?" y_w: "Each player requires a $25 jersey, a $15.20 pair of shorts, and a pair of socks priced at $6.80. So the total cost for each player is $25 + $15.20 + $6.80 = 47. Since there are sixteen players on the football team, the total cost for all of them is 16 * $47 = $752. \#\#\#\# 752 The answer is: 752" y_l: "Each player requires a $25 jersey, a $15.20 pair of shorts, and a pair of socks priced at $6.80. So the total cost for each player is $25 + $15.20 + $6.80 = 52. Since there are sixteen players on the football team, the total cost for all of them is 16 * $47 = $752. \#\#\#\# 752 The answer is: 752" Our motivation in building this dataset is to align models towards being precise in intermediate calculations. This dataset has low edit distance -- the normalised edit distance is approximately 6.5%. The dataset is meant to be used to fine-tune LLMs (which have already undergone SFT) using the DPOP loss function. We used this dataset to create the [Smaug series of models](https://github.com/abacusai/smaug). The dataset contains 393,999 training examples and 1,000 evaluation examples. See more details in the [datasheet](https://github.com/abacusai/smaug/blob/main/datasheet.md), and in our paper: https://arxiv.org/abs/2402.13228.

提供机构：

abacusai

原始信息汇总

数据集卡片 "MetaMath_DPO_FewShot"

数据集描述

GSM8K citep{cobbe2021training} 是一个包含多样化的中小学数学应用题的数据集，通常被用作衡量大型语言模型（LLMs）数学和推理能力的标准。MetaMath 数据集是GSM8K训练集的扩展，通过数据增强构建。

数据集结构

数据集分为查询和响应，其中查询是一个涉及数学计算或推理的问题，响应是一个逻辑推理和计算的步骤序列，最终得出答案。为了构建配对偏好版本的MetaMath，我们将查询作为提示x，将响应作为首选完成y_w。通过随机修改中间计算结果中的一个使其不正确来创建y_l，但保持答案不变。

示例

x: "What is the total cost of purchasing equipment for all sixteen players on the football team, considering that each player requires a $25 jersey, a $15.20 pair of shorts, and a pair of socks priced at $6.80?"

y_w: "Each player requires a $25 jersey, a $15.20 pair of shorts, and a pair of socks priced at $6.80. So the total cost for each player is $25 + $15.20 + $6.80 = 47. Since there are sixteen players on the football team, the total cost for all of them is 16 * $47 = $752. #### 752 The answer is: 752"

y_l: "Each player requires a $25 jersey, a $15.20 pair of shorts, and a pair of socks priced at $6.80. So the total cost for each player is $25 + $15.20 + $6.80 = 52. Since there are sixteen players on the football team, the total cost for all of them is 16 * $47 = $752. #### 752 The answer is: 752"

数据集目的

构建此数据集的目的是使模型在中间计算中更加精确。该数据集的编辑距离较低，标准化编辑距离约为6.5%。该数据集旨在用于使用DPOP损失函数对已经进行过SFT的LLMs进行微调。我们使用此数据集创建了Smaug系列模型。

数据集规模

数据集包含393,999个训练样本和1,000个评估样本。