JingyangYi/SB_DPO
收藏Hugging Face2025-04-03 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/JingyangYi/SB_DPO
下载链接
链接失效反馈官方服务:
资源简介:
这个数据集是由deepseek-ai/DeepSeek-R1-Distill-Qwen-7B模型生成的,针对每个问题,模型会生成10个答案。在这些答案中,正确的且最短的答案被选为chosen,最长的答案无论正确与否都被选为rejected。如果没有正确答案的问题会被跳过,因此数据集中不包含一些非常困难的问题。数据集包含训练集train,共有14953个示例。
This dataset is generated by the deepseek-ai/DeepSeek-R1-Distill-Qwen-7B model, where for each problem, the model generates 10 completions. Among these completions, the shortest correct answer is chosen as chosen, and the longest answer, whether correct or not, is chosen as rejected. Problems without any correct completions are skipped, meaning that some very challenging problems are not included in this dataset. The dataset contains a training set train with a total of 14,953 examples.
提供机构:
JingyangYi



