tulu-3-pref-personas-instruction-following
收藏魔搭社区2025-12-05 更新2024-11-30 收录
下载链接:
https://modelscope.cn/datasets/allenai/tulu-3-pref-personas-instruction-following
下载链接
链接失效反馈官方服务:
资源简介:
<img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu-3/Tulu3-logo.png" alt="Tulu3 banner" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
### Dataset Descriptions
This dataset contains **19890** preference examples and is synthetically created to enhance models' precise instruction following capabilities while satisfying several constraints. The dataset containts preference pairs (chosen, reject responses) and can be used for preference tuning methods (e.g., PPO, DPO).
#### Dataset Construction
To create this dataset, we took a subset of its supervised-tuning version [here](https://huggingface.co/datasets/allenai/tulu-3-sft-personas-instruction-following) and convert it into preference dataset. Specifically, we rewrote each prompt in the subset to relax one of the given constraints such that the response to the modified prompt is no longer a valid response for the original prompt. We use the response to the new modified prompt as the `rejected` response.
- **Paper:** [TBD]()
- **Repository:** [TBD]()
- **Language(s) (NLP):** English
- **License:** ODC-BY
- **Point of Contact:** [Faeze Brahman](mailto:faezeb@allenai.org)
### Loading
```python
from datasets import load_dataset
dataset = load_dataset("allenai/tulu-3-pref-personas-instruction-following")["train"]
```
### Dataset Structure
Each example in the dataset contains the standard instruction-tuning data points as follow:
- id (str): a unique identifier
- prompt (str): python programming questions grounded in a given persona/scenario
- constraints (list of str): a list of verifiable constraints that need to be satisfied by the assistant response
- chosen (str): the chosen response for the given instruction following prompt satisfying the constraints
- rejected (str): the rejected response failing to satisfy one of the given constraints
- chonsen_model (str): model used to generate the chosen response
- rejected_model (str): model used to generate the rejected response

*图片宽度为400像素,样式为左右边距自动、块级显示*
### 数据集描述
本数据集包含**19890**条偏好示例,为提升模型精准遵循指令的能力而合成构建,同时满足多项约束条件。该数据集包含偏好对(已选回复、被拒回复),可用于偏好微调方法(如PPO、DPO)。
#### 数据集构建
为创建本数据集,我们从其监督微调版本的子集(链接:https://huggingface.co/datasets/allenai/tulu-3-sft-personas-instruction-following)中抽取数据,并将其转换为偏好数据集。具体而言,我们对该子集中的每条提示词进行改写,放宽其中一项给定约束,使得针对修改后提示词的回复不再符合原提示词的要求。我们将针对修改后提示词生成的回复作为`rejected`(被拒)回复。
- **论文:** [待补充]()
- **代码仓库:** [待补充]()
- **自然语言处理语言:** 英语
- **许可证:** ODC-BY
- **联系人:** [Faeze Brahman](mailto:faezeb@allenai.org)
### 加载方式
python
from datasets import load_dataset
dataset = load_dataset("allenai/tulu-3-pref-personas-instruction-following")["train"]
### 数据集结构
数据集中的每条示例均包含标准的指令微调数据字段,具体如下:
- `id`(字符串):唯一标识符
- `prompt`(字符串):基于特定角色/场景的Python编程问题
- `constraints`(字符串列表):助手回复需满足的一系列可验证约束条件
- `chosen`(字符串):符合给定约束的、适配该指令遵循提示词的优选回复
- `rejected`(字符串):未满足任意一项给定约束的被拒回复
- `chonsen_model`(字符串):用于生成`chosen`回复的模型
- `rejected_model`(字符串):用于生成被拒回复的模型
提供机构:
maas
创建时间:
2025-05-28
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集包含19890个合成创建的偏好示例,旨在增强模型在满足约束条件下的精确指令遵循能力。它由偏好对(chosen和rejected响应)组成,适用于PPO、DPO等偏好调优方法,并通过修改提示中的约束来生成无效响应作为拒绝样本。数据集为英语,结构包括id、prompt、constraints等标准字段。
以上内容由遇见数据集搜集并总结生成



