five

Curriculum_DPO_preferences

收藏
魔搭社区2025-12-05 更新2025-02-01 收录
下载链接:
https://modelscope.cn/datasets/ServiceNow-AI/Curriculum_DPO_preferences
下载链接
链接失效反馈
官方服务:
资源简介:
# Curriculum DPO Preference Pairs This repository provides the curriculum DPO preference pairs used in the paper [Curri-DPO](https://aclanthology.org/2024.findings-emnlp.754/), which explores enhancing model alignment through curriculum learning and ranked preferences. ## Datasets ### Ultrafeedback The Ultrafeedback dataset contains 64K preference pairs. We randomly sample 5K pairs and rank responses for each prompt, organizing them into three difficulty levels: *easy*, *medium*, and *hard*, based on response scores. ### Open-Assistant The Open-Assistant dataset consists of human-annotated conversation trees. We use the hierarchical rankings within these trees to categorize preference pairs into *easy*, *medium*, and *hard* levels. For more information, please refer to the paper: [Enhancing Alignment using Curriculum Learning & Ranked Preferences](https://aclanthology.org/2024.findings-emnlp.754/). ## Data Format Each data entry is structured as a conversation between a user and an assistant, organized by difficulty level. The format is as follows: ```python { "easy": { "conversation": [ { "role": "user", "content": "" }, { "role": "assistant", "chosen_content": "", "rejected_content": "" } ] }, "medium": { "conversation": [ { "role": "user", "content": "" }, { "role": "assistant", "chosen_content": "", "rejected_content": "" } ] }, "hard": { "conversation": [ { "role": "user", "content": "" }, { "role": "assistant", "chosen_content": "", "rejected_content": "" } ] } } ``` - **Difficulty Levels** (`easy`, `medium`, `hard`): Each level contains conversations based on the complexity of the prompt and response. - **User Turn**: - `"role"`: Specifies the speaker's role, set to `"user"`. - `"content"`: Contains the question or prompt. - **Assistant Turn**: - `"role"`: Specifies the speaker's role, set to `"assistant"`. - `"chosen_content"`: The preferred response selected by the assistant. - `"rejected_content"`: The non-preferred response. ## Citation If you use this work, please cite: ```bibtex @article{pattnaik2024curry, title={Curri-DPO: Enhancing Alignment Using Curriculum Learning \& Ranked Preferences}, author={Pattnaik, Pulkit and Maheshwary, Rishabh and Ogueji, Kelechi and Yadav, Vikas and Madhusudhan, Sathwik Tejaswi}, journal={arXiv preprint arXiv:2403.07230}, year={2024} } ``` ---

# 课程式DPO(Direct Preference Optimization)偏好配对数据集 本仓库提供了论文《Curri-DPO》(链接:https://aclanthology.org/2024.findings-emnlp.754/)中使用的课程式DPO偏好配对数据集,该研究探索了通过课程学习与排序偏好提升模型对齐效果的方法。 ## 数据集 ### Ultrafeedback Ultrafeedback数据集包含64000条偏好配对样本。我们从中随机采样5000条配对,针对每个提示词的回复进行排序,并基于回复得分将其划分为**简单(easy)**、**中等(medium)**与**困难(hard)**三个难度等级。 ### Open-Assistant Open-Assistant数据集由人工标注的对话树构成。我们利用这些对话树内部的层级排序信息,将偏好配对样本划分为简单、中等与困难三个难度等级。 如需了解更多信息,请参阅论文《Enhancing Alignment using Curriculum Learning & Ranked Preferences》(链接:https://aclanthology.org/2024.findings-emnlp.754/)。 ## 数据格式 每条数据条目以用户与助手的对话形式组织,并按难度等级分类,具体格式如下: python { "easy": { "conversation": [ { "role": "user", "content": "" }, { "role": "assistant", "chosen_content": "", "rejected_content": "" } ] }, "medium": { "conversation": [ { "role": "user", "content": "" }, { "role": "assistant", "chosen_content": "", "rejected_content": "" } ] }, "hard": { "conversation": [ { "role": "user", "content": "" }, { "role": "assistant", "chosen_content": "", "rejected_content": "" } ] } } - **难度等级(`easy`、`medium`、`hard`)**:每个等级下的对话均基于提示词与回复的复杂度进行组织。 - **用户轮次**: - **角色(`role`)**:标识发言者身份,固定为`"user"`。 - **内容(`content`)**:存储用户的提问或提示词。 - **助手轮次**: - **角色(`role`)**:标识发言者身份,固定为`"assistant"`。 - **优选回复内容(`chosen_content`)**:被选中的偏好回复。 - **弃选回复内容(`rejected_content`)**:未被选中的非偏好回复。 ## 引用声明 若您使用本数据集,请引用如下文献: bibtex @article{pattnaik2024curry, title={Curri-DPO: Enhancing Alignment Using Curriculum Learning & Ranked Preferences}, author={Pattnaik, Pulkit and Maheshwary, Rishabh and Ogueji, Kelechi and Yadav, Vikas and Madhusudhan, Sathwik Tejaswi}, journal={arXiv preprint arXiv:2403.07230}, year={2024} }
提供机构:
maas
创建时间:
2025-01-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作