Curriculum_DPO_preferences
收藏魔搭社区2025-12-05 更新2025-02-01 收录
下载链接:
https://modelscope.cn/datasets/ServiceNow-AI/Curriculum_DPO_preferences
下载链接
链接失效反馈官方服务:
资源简介:
# Curriculum DPO Preference Pairs
This repository provides the curriculum DPO preference pairs used in the paper [Curri-DPO](https://aclanthology.org/2024.findings-emnlp.754/), which explores enhancing model alignment through curriculum learning and ranked preferences.
## Datasets
### Ultrafeedback
The Ultrafeedback dataset contains 64K preference pairs. We randomly sample 5K pairs and rank responses for each prompt, organizing them into three difficulty levels: *easy*, *medium*, and *hard*, based on response scores.
### Open-Assistant
The Open-Assistant dataset consists of human-annotated conversation trees. We use the hierarchical rankings within these trees to categorize preference pairs into *easy*, *medium*, and *hard* levels.
For more information, please refer to the paper: [Enhancing Alignment using Curriculum Learning & Ranked Preferences](https://aclanthology.org/2024.findings-emnlp.754/).
## Data Format
Each data entry is structured as a conversation between a user and an assistant, organized by difficulty level. The format is as follows:
```python
{
"easy": {
"conversation": [
{
"role": "user",
"content": ""
},
{
"role": "assistant",
"chosen_content": "",
"rejected_content": ""
}
]
},
"medium": {
"conversation": [
{
"role": "user",
"content": ""
},
{
"role": "assistant",
"chosen_content": "",
"rejected_content": ""
}
]
},
"hard": {
"conversation": [
{
"role": "user",
"content": ""
},
{
"role": "assistant",
"chosen_content": "",
"rejected_content": ""
}
]
}
}
```
- **Difficulty Levels** (`easy`, `medium`, `hard`): Each level contains conversations based on the complexity of the prompt and response.
- **User Turn**:
- `"role"`: Specifies the speaker's role, set to `"user"`.
- `"content"`: Contains the question or prompt.
- **Assistant Turn**:
- `"role"`: Specifies the speaker's role, set to `"assistant"`.
- `"chosen_content"`: The preferred response selected by the assistant.
- `"rejected_content"`: The non-preferred response.
## Citation
If you use this work, please cite:
```bibtex
@article{pattnaik2024curry,
title={Curri-DPO: Enhancing Alignment Using Curriculum Learning \& Ranked Preferences},
author={Pattnaik, Pulkit and Maheshwary, Rishabh and Ogueji, Kelechi and Yadav, Vikas and Madhusudhan, Sathwik Tejaswi},
journal={arXiv preprint arXiv:2403.07230},
year={2024}
}
```
---
# 课程式DPO(Direct Preference Optimization)偏好配对数据集
本仓库提供了论文《Curri-DPO》(链接:https://aclanthology.org/2024.findings-emnlp.754/)中使用的课程式DPO偏好配对数据集,该研究探索了通过课程学习与排序偏好提升模型对齐效果的方法。
## 数据集
### Ultrafeedback
Ultrafeedback数据集包含64000条偏好配对样本。我们从中随机采样5000条配对,针对每个提示词的回复进行排序,并基于回复得分将其划分为**简单(easy)**、**中等(medium)**与**困难(hard)**三个难度等级。
### Open-Assistant
Open-Assistant数据集由人工标注的对话树构成。我们利用这些对话树内部的层级排序信息,将偏好配对样本划分为简单、中等与困难三个难度等级。
如需了解更多信息,请参阅论文《Enhancing Alignment using Curriculum Learning & Ranked Preferences》(链接:https://aclanthology.org/2024.findings-emnlp.754/)。
## 数据格式
每条数据条目以用户与助手的对话形式组织,并按难度等级分类,具体格式如下:
python
{
"easy": {
"conversation": [
{
"role": "user",
"content": ""
},
{
"role": "assistant",
"chosen_content": "",
"rejected_content": ""
}
]
},
"medium": {
"conversation": [
{
"role": "user",
"content": ""
},
{
"role": "assistant",
"chosen_content": "",
"rejected_content": ""
}
]
},
"hard": {
"conversation": [
{
"role": "user",
"content": ""
},
{
"role": "assistant",
"chosen_content": "",
"rejected_content": ""
}
]
}
}
- **难度等级(`easy`、`medium`、`hard`)**:每个等级下的对话均基于提示词与回复的复杂度进行组织。
- **用户轮次**:
- **角色(`role`)**:标识发言者身份,固定为`"user"`。
- **内容(`content`)**:存储用户的提问或提示词。
- **助手轮次**:
- **角色(`role`)**:标识发言者身份,固定为`"assistant"`。
- **优选回复内容(`chosen_content`)**:被选中的偏好回复。
- **弃选回复内容(`rejected_content`)**:未被选中的非偏好回复。
## 引用声明
若您使用本数据集,请引用如下文献:
bibtex
@article{pattnaik2024curry,
title={Curri-DPO: Enhancing Alignment Using Curriculum Learning & Ranked Preferences},
author={Pattnaik, Pulkit and Maheshwary, Rishabh and Ogueji, Kelechi and Yadav, Vikas and Madhusudhan, Sathwik Tejaswi},
journal={arXiv preprint arXiv:2403.07230},
year={2024}
}
提供机构:
maas
创建时间:
2025-01-29



