GenRM/MetaMath_DPO_FewShot-abacusai
收藏Hugging Face2025-05-11 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/GenRM/MetaMath_DPO_FewShot-abacusai
下载链接
链接失效反馈官方服务:
资源简介:
MetaMath_DPO_FewShot 数据集是一个基于 GSM8K 数据集的扩展,包含了小学水平的多样化数学文字问题。该数据集将问题分为查询和响应,查询是涉及数学计算或推理的问题,响应是一个包含最终答案的逻辑步骤和计算序列。数据集通过创建一个配对的偏好版本,其中包含了正确的响应和一个通过随机篡改中间计算结果产生的错误响应。构建这个数据集的目的是为了使模型在中间计算中更加精确。数据集具有较低的编辑距离,适用于使用 DPOP 损失函数对已经进行过 SFT 的 LLMs 进行微调。该数据集包含393,999个训练示例和1,000个评估示例。
The MetaMath_DPO_FewShot dataset is an extension of the GSM8K dataset, which includes diverse grade school math word problems. The dataset is partitioned into queries and responses, where the query is a question involving mathematical calculation or reasoning, and the response is a logical series of steps and calculations leading to a final answer. The dataset features a paired-preference version, with one correct response and one incorrect response created by randomly corrupting an intermediate calculation result. The purpose of constructing this dataset is to align models to be precise in intermediate calculations. The dataset has a low edit distance and is intended for fine-tuning LLMs (that have already undergone SFT) using the DPOP loss function. The dataset contains 393,999 training examples and 1,000 evaluation examples.
提供机构:
GenRM



