Curriculum_DPO_preferences
收藏arXiv2025-09-30 收录
下载链接:
https://huggingface.co/datasets/ServiceNow-AI/Curriculum_DPO_preferences
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了为训练直接偏好优化(DPO)模型构建的多组偏好对,这些偏好对是按照响应质量评分从易到难排序的。此外,该数据集还用于训练并评估Curry-DPO方法相较于标准DPO技术的有效性,结果显示Curry-DPO方法在性能上有了显著的提升。该任务主要是利用偏好对进行直接偏好优化(DPO)训练。
This dataset contains multiple sets of preference pairs constructed for training Direct Preference Optimization (DPO) models, which are sorted from easy to hard based on response quality scores. Additionally, this dataset is utilized to train and evaluate the effectiveness of the Curry-DPO method against standard DPO techniques, with results showing that the Curry-DPO method achieves significant performance improvements. The core task of this dataset is to conduct Direct Preference Optimization (DPO) training using these preference pairs.
提供机构:
ServiceNow-AI



