下载链接：

https://modelscope.cn/datasets/ServiceNow-AI/Curriculum_DPO_preferences

下载链接

链接失效反馈

官方服务：

资源简介：

# Curriculum DPO Preference Pairs This repository provides the curriculum DPO preference pairs used in the paper [Curri-DPO](https://aclanthology.org/2024.findings-emnlp.754/), which explores enhancing model alignment through curriculum learning and ranked preferences. ## Datasets ### Ultrafeedback The Ultrafeedback dataset contains 64K preference pairs. We randomly sample 5K pairs and rank responses for each prompt, organizing them into three difficulty levels: *easy*, *medium*, and *hard*, based on response scores. ### Open-Assistant The Open-Assistant dataset consists of human-annotated conversation trees. We use the hierarchical rankings within these trees to categorize preference pairs into *easy*, *medium*, and *hard* levels. For more information, please refer to the paper: [Enhancing Alignment using Curriculum Learning & Ranked Preferences](https://aclanthology.org/2024.findings-emnlp.754/). ## Data Format Each data entry is structured as a conversation between a user and an assistant, organized by difficulty level. The format is as follows: ```python { "easy": { "conversation": [ { "role": "user", "content": "" }, { "role": "assistant", "chosen_content": "", "rejected_content": "" } ] }, "medium": { "conversation": [ { "role": "user", "content": "" }, { "role": "assistant", "chosen_content": "", "rejected_content": "" } ] }, "hard": { "conversation": [ { "role": "user", "content": "" }, { "role": "assistant", "chosen_content": "", "rejected_content": "" } ] } } ``` - **Difficulty Levels** (`easy`, `medium`, `hard`): Each level contains conversations based on the complexity of the prompt and response. - **User Turn**: - `"role"`: Specifies the speaker's role, set to `"user"`. - `"content"`: Contains the question or prompt. - **Assistant Turn**: - `"role"`: Specifies the speaker's role, set to `"assistant"`. - `"chosen_content"`: The preferred response selected by the assistant. - `"rejected_content"`: The non-preferred response. ## Citation If you use this work, please cite: ```bibtex @article{pattnaik2024curry, title={Curri-DPO: Enhancing Alignment Using Curriculum Learning \& Ranked Preferences}, author={Pattnaik, Pulkit and Maheshwary, Rishabh and Ogueji, Kelechi and Yadav, Vikas and Madhusudhan, Sathwik Tejaswi}, journal={arXiv preprint arXiv:2403.07230}, year={2024} } ``` ---

# 课程式DPO（Direct Preference Optimization）偏好配对数据集本仓库提供了论文《Curri-DPO》（链接：https://aclanthology.org/2024.findings-emnlp.754/）中使用的课程式DPO偏好配对数据集，该研究探索了通过课程学习与排序偏好提升模型对齐效果的方法。 ## 数据集 ### Ultrafeedback Ultrafeedback数据集包含64000条偏好配对样本。我们从中随机采样5000条配对，针对每个提示词的回复进行排序，并基于回复得分将其划分为**简单（easy）**、**中等（medium）**与**困难（hard）**三个难度等级。 ### Open-Assistant Open-Assistant数据集由人工标注的对话树构成。我们利用这些对话树内部的层级排序信息，将偏好配对样本划分为简单、中等与困难三个难度等级。如需了解更多信息，请参阅论文《Enhancing Alignment using Curriculum Learning & Ranked Preferences》（链接：https://aclanthology.org/2024.findings-emnlp.754/）。 ## 数据格式每条数据条目以用户与助手的对话形式组织，并按难度等级分类，具体格式如下： python { "easy": { "conversation": [ { "role": "user", "content": "" }, { "role": "assistant", "chosen_content": "", "rejected_content": "" } ] }, "medium": { "conversation": [ { "role": "user", "content": "" }, { "role": "assistant", "chosen_content": "", "rejected_content": "" } ] }, "hard": { "conversation": [ { "role": "user", "content": "" }, { "role": "assistant", "chosen_content": "", "rejected_content": "" } ] } } - **难度等级（`easy`、`medium`、`hard`）**：每个等级下的对话均基于提示词与回复的复杂度进行组织。 - **用户轮次**： - **角色（`role`）**：标识发言者身份，固定为`"user"`。 - **内容（`content`）**：存储用户的提问或提示词。 - **助手轮次**： - **角色（`role`）**：标识发言者身份，固定为`"assistant"`。 - **优选回复内容（`chosen_content`）**：被选中的偏好回复。 - **弃选回复内容（`rejected_content`）**：未被选中的非偏好回复。 ## 引用声明若您使用本数据集，请引用如下文献： bibtex @article{pattnaik2024curry, title={Curri-DPO: Enhancing Alignment Using Curriculum Learning & Ranked Preferences}, author={Pattnaik, Pulkit and Maheshwary, Rishabh and Ogueji, Kelechi and Yadav, Vikas and Madhusudhan, Sathwik Tejaswi}, journal={arXiv preprint arXiv:2403.07230}, year={2024} }

应用场景：