preference-dissection
收藏魔搭社区2026-01-02 更新2025-02-15 收录
下载链接:
https://modelscope.cn/datasets/GAIR/preference-dissection
下载链接
链接失效反馈官方服务:
资源简介:
## Introduction
We release the annotated data used in [Dissecting Human and LLM Preferences](https://arxiv.org/abs/2402.11296).
*Original Dataset* - The dataset is based on [lmsys/chatbot_arena_conversations](https://huggingface.co/datasets/lmsys/chatbot_arena_conversations), which contains 33K cleaned conversations with pairwise human preferences collected from 13K unique IP addresses on the [Chatbot Arena](https://lmsys.org/blog/2023-05-03-arena/) from April to June 2023.
*Filtering and Scenario-wise Sampling* - We filter out the conversations that are not in English, with "Tie" or "Both Bad" labels, and the multi-turn conversations. We first sample 400 samples with unsafe queries according to the OpenAI moderation API tags and the additional toxic tags in the original dataset, then we apply [Auto-J's scenario classifier](https://huggingface.co/GAIR/autoj-scenario-classifier) to determine the scenario of each sample (we merge the Auto-J's scenarios into 10 new ones). For the *Knowledge-aware* and *Others* scenarios, we pick 820 samples, and for the other scenarios, we pick 400 samples. The total number is 5,240.
*Collecting Preferences* - Besides the human preference labels in this original dataset, we also collect the binary preference labels from 32 LLMs, including 2 proprietary LLMs and 30 open-source ones.
*Annotation on Defined Properties* - We define a set of 29 properties, we annotate how each property is satisfied (in Likert scale rating or property-specific annotation) in all responses ($5,240\times 2=10,480$). See our paper for more details of the defined properties.
## Dataset Overview
An example of the json format is as follows:
```json
{
"query": "...",
"scenario_auto-j": "...",
"scenario_group": "...",
"response_1": {
"content": "...",
"model": "...",
"num_words": "..."
},
"response_2": {...},
"gpt-4-turbo_reference": "...",
"clear intent": "Yes/No",
"explicitly express feelings": "Yes/No",
"explicit constraints": [
...
],
"explicit subjective stances": [
...
],
"explicit mistakes or biases": [
...
],
"preference_labels": {
"human": "response_1/response_2",
"gpt-4-turbo": "response_1/response_2",
...
},
"basic_response_1": {
"admit limitations or mistakes": 0/1/2/3,
"authoritative tone": 0/1/2/3,
...
},
"basic_response_2": {...},
"errors_response_1": {
"applicable or not": "applicable/not applicable",
"errors":[
{
"brief description": "...",
"severity": "severe/moderate/minor",
"type": "...",
},
...
]
},
"errors_response_2": {...},
"query-specific_response_1": {
"clarify user intent": ...,
"correcting explicit mistakes or biases": None,
"satisfying explicit constraints": [
...
],
"showing empathetic": [
...
],
"supporting explicit subjective stances": [
...
]
},
"query-specific_response_2": {...}
}
```
The following fields are basic information:
- **query**: The user query.
- **scenario_auto-j**: The scenario classified by Auto-J's classifier.
- **scenario_group**: One of the 10 new scenarios we merged from the Auto-J's scenario, including an *Unsafe Query* scenario.
- **response_1/response_2**: The content of a response:
- **content**: The text content.
- **model**: The model that generate this response.
- **num_words**: The number of words of this response, determined by NLTK.
- **gpt-4-turbo_reference**: An reference response generated by GPT-4-Turbo.
The following fields are Query-Specific prerequisites. For the last three, there may be an empty list if there is no constraints/stances/mistakes.
- **clear intent**: Whether the intent of the user is clearly expressed in the query, "Yes" or "No".
- **explicitly express feelings**: Whether the user clearly express his/her feelings or emotions in the query, "Yes" or "No".
- **explicit constraints**": A list containing all the explicit constraints in the query.
- **explicit subjective stances**: A list containing all the subjective stances in the query.
- **explicit mistakes or biases**: A list containing all the mistakes or biases in the query.
The following fields are the main body of the annotation.
- **preference_labels**: The preference label for each judge (human or an LLM) indicating which response is preferred in a pair, "response_1/response_2".
- **basic_response_1/basic_response_2**: The annotated ratings of the 20 basic properties (except *lengthy*) for the response.
- **property_name**: 0/1/2/3
- ...
- **errors_response_1/errors_response_2**: The detected errors of the response.
- **applicable or not**: If GPT-4-Turbo find itself can reliably detect the errors in the response.
- **errors**: A list containing the detected errors in the response.
- **brief description**: A brief description of the error.
- **severity**: How much the error affect the overall correctness of the response, "severe/moderate/minor".
- **type**: The type of the error, "factual error/information contradiction to the query/math operation error/code generation error"
- **query-specific_response_1/query-specific_response_2**: The annotation results of the Query-Specific properties.
- **clarify user intent**: If the user intent is not clear, rate how much the response help clarify the intent, 0/1/2/3.
- **showing empathetic**: If the user expresses feelings or emotions, rate how much the response show empathetic, 0/1/2/3.
- **satisfying explicit constraints**: If there are explicit constraints in the query, rate how much the response satisfy each of them.
- A list of "{description of constraint} | 0/1/2/3"
- **correcting explicit mistakes or biases**: If there are mistakes of biases in the query, classify how the response correct each of them
- A list of "{description of mistake} | Pointed out and corrected/Pointed out but not corrected/Corrected without being pointed out/Neither pointed out nor corrected"
- **supporting explicit subjective stances**: If there are subject stances in the query, classify how the response support each of them
- A list of "{description of stance} | Strongly supported/Weakly supported/Neutral/Weakly opposed/Strongly opposed"
## Statistics
👇 Number of samples meeting 5 Query-specific prerequisites.
| Prerequisite | # | Prerequisite | # |
| ------------------------- | ----- | ---------------- | ---- |
| with explicit constraints | 1,418 | unclear intent | 459 |
| show subjective stances | 388 | express feelings | 121 |
| contain mistakes or bias | 401 | | |
👇 Mean Score/Count for each property in collected data. *The average scores of 5 query-specific properties are calculated only on samples where the queries met specific prerequisites.
| Property | Mean Score/Count | Property | Mean Score/Count |
| ---------------------------- | ---------------- | ---------------------------- | ---------------- |
| **Mean Score** | |
| harmless | 2.90 | persuasive | 0.27 |
| grammarly correct | 2.70 | step-by-step | 0.37 |
| friendly | 1.79 | use informal expressions | 0.04 |
| polite | 2.78 | clear | 2.54 |
| interactive | 0.22 | contain rich information | 1.74 |
| authoritative | 1.67 | novel | 0.47 |
| funny | 0.08 | relevant | 2.45 |
| use rhetorical devices | 0.16 | clarify intent* | 1.33 |
| complex word & sentence | 0.89 | show empathetic* | 1.48 |
| use supporting materials | 0.13 | satisfy constraints* | 2.01 |
| well formatted | 1.26 | support stances* | 2.28 |
| admit limits | 0.17 | correct mistakes* | 1.08 |
| **Mean Count** | |
| severe errors | 0.59 | minor errors | 0.23 |
| moderate errors | 0.61 | length | 164.52 |
👇 Property correlation in the annotated data.
<img src="./property_corr.PNG" alt="image-20240213145030747" style="zoom: 50%;" />
## Disclaimers and Terms
**This part is copied from the original dataset*
- **This dataset contains conversations that may be considered unsafe, offensive, or upsetting.** It is not intended for training dialogue agents without applying appropriate filtering measures. We are not responsible for any outputs of the models trained on this dataset.
- Statements or opinions made in this dataset do not reflect the views of researchers or institutions involved in the data collection effort.
- Users of this data are responsible for ensuring its appropriate use, which includes abiding by any applicable laws and regulations.
- Users of this data should adhere to the terms of use for a specific model when using its direct outputs.
- Users of this data agree to not attempt to determine the identity of individuals in this dataset.
## License
Following the original dataset, this dataset is licensed under CC-BY-NC-4.0.
## Citation
```
@article{li2024dissecting,
title={Dissecting Human and LLM Preferences},
author={Li, Junlong and Zhou, Fan and Sun, Shichao and Zhang, Yikai and Zhao, Hai and Liu, Pengfei},
journal={arXiv preprint arXiv:2402.11296},
year={2024}
}
```
## 引言
本数据集发布自论文《剖析人类与大语言模型偏好》(Dissecting Human and LLM Preferences,https://arxiv.org/abs/2402.11296)中使用的标注数据。
### 原始数据集
本数据集基于[lmsys/chatbot_arena_conversations](https://huggingface.co/datasets/lmsys/chatbot_arena_conversations)构建,该数据集包含33,000条经清洗的对话,附带成对人类偏好标注,数据采集于2023年4月至6月期间的[聊天机器人竞技场(Chatbot Arena)](https://lmsys.org/blog/2023-05-03-arena/)平台,来源覆盖13万个独立IP地址。
### 过滤与场景分层采样
我们对原始数据进行了三重过滤:剔除非英语对话、带有“平局(Tie)”或“双方均不佳(Both Bad)”标签的对话,以及多轮对话。首先,我们依据OpenAI内容审核API(OpenAI moderation API)标签与原始数据集中额外的毒性标签,选取了400条包含不安全查询的样本;随后,我们使用[Auto-J场景分类器(Auto-J's scenario classifier)](https://huggingface.co/GAIR/autoj-scenario-classifier)为每条样本标注场景,并将Auto-J原有的场景合并为10个新场景。其中,知识感知(Knowledge-aware)与其他(Others)场景各选取820条样本,其余8个场景各选取400条样本,最终总样本量为5,240条。
### 偏好标签收集
除原始数据集中已有的人类偏好标签外,我们还收集了32个大语言模型(Large Language Model,LLM)的二元偏好标签,其中包含2个闭源大语言模型与30个开源大语言模型。
### 定义属性标注
我们定义了29项评价属性,并针对所有回复(5240条对话×2个回复=10480条回复)标注了每条属性的满足程度,标注方式采用李克特量表(Likert scale)评分或针对特定属性的专属标注规则。关于定义的属性的更多细节,请参阅我们的论文。
## 数据集概览
本数据集的JSON格式示例如下:
json
{
"query": "用户查询内容",
"scenario_auto-j": "Auto-J分类器标注的场景",
"scenario_group": "我们将Auto-J原场景合并后的10个新场景之一,包含不安全查询场景",
"response_1": {
"content": "回复文本内容",
"model": "生成该回复的模型",
"num_words": "回复的单词数,由自然语言工具包(Natural Language Toolkit,NLTK)统计"
},
"response_2": {...},
"gpt-4-turbo_reference": "GPT-4-Turbo生成的参考回复",
"clear intent": "Yes/No(表示用户查询意图是否明确)",
"explicitly express feelings": "Yes/No(表示用户是否在查询中明确表达情感或情绪)",
"explicit constraints": ["查询中的明确约束列表"],
"explicit subjective stances": ["查询中的主观立场列表"],
"explicit mistakes or biases": ["查询中的错误或偏见列表"],
"preference_labels": {
"human": "response_1/response_2",
"gpt-4-turbo": "response_1/response_2",
...
},
"basic_response_1": {
"admit limitations or mistakes": 0/1/2/3,
"authoritative tone": 0/1/2/3,
...
},
"basic_response_2": {...},
"errors_response_1": {
"applicable or not": "applicable/not applicable(表示GPT-4-Turbo是否可可靠检测该回复中的错误)",
"errors": [
{
"brief description": "错误的简要描述",
"severity": "severe/moderate/minor(表示错误对回复整体正确性的影响程度)",
"type": "factual error/information contradiction to the query/math operation error/code generation error(表示错误类型)"
},
...
]
},
"errors_response_2": {...},
"query-specific_response_1": {
"clarify user intent": "若用户意图不明确,评分该回复对澄清意图的帮助程度,取值0/1/2/3",
"correcting explicit mistakes or biases": None,
"satisfying explicit constraints": [
"{约束描述} | 0/1/2/3"
],
"showing empathetic": [
"{情感描述} | 0/1/2/3"
],
"supporting explicit subjective stances": [
"{立场描述} | 0/1/2/3"
]
},
"query-specific_response_2": {...}
}
以下为各字段的详细说明:
#### 基础信息字段
- **query**:用户查询内容。
- **scenario_auto-j**:由Auto-J分类器标注的原始场景。
- **scenario_group**:我们将Auto-J原场景合并后的10个新场景之一,包含“不安全查询”场景。
- **response_1/response_2**:单条回复的相关信息:
- **content**:回复的文本内容。
- **model**:生成该回复的大语言模型。
- **num_words**:回复的单词数,由NLTK工具统计得到。
- **gpt-4-turbo_reference**:由GPT-4-Turbo生成的参考回复。
#### 查询特定前提条件字段
对于最后三个字段,若查询中无对应约束、立场或错误,则对应列表为空。
- **clear intent**:用户查询的意图是否明确,取值为“Yes”或“No”。
- **explicitly express feelings**:用户是否在查询中明确表达了情感或情绪,取值为“Yes”或“No”。
- **explicit constraints**:包含查询中所有明确约束的列表。
- **explicit subjective stances**:包含查询中所有主观立场的列表。
- **explicit mistakes or biases**:包含查询中所有错误或偏见的列表。
#### 标注主体字段
- **preference_labels**:各评判者(人类或大语言模型)的偏好标签,用于指示该对话对中更受偏好的回复为response_1或response_2。
- **basic_response_1/basic_response_2**:对回复的20项基础属性(不含长度属性)的标注评分,格式为`{属性名}: 0/1/2/3`。
- **errors_response_1/errors_response_2**:检测到的回复错误信息:
- **applicable or not**:GPT-4-Turbo是否能够可靠检测该回复中的错误,取值为“applicable(可检测)”或“not applicable(不可检测)”。
- **errors**:回复中检测到的错误列表,每个错误包含以下子字段:
- **brief description**:错误的简要描述。
- **severity**:错误对回复整体正确性的影响程度,取值为“severe(严重)/moderate(中等)/minor(轻微)”。
- **type**:错误类型,可选值为“factual error(事实错误)/information contradiction to the query(与查询信息矛盾)/math operation error(数学运算错误)/code generation error(代码生成错误)”。
- **query-specific_response_1/query-specific_response_2**:查询特定属性的标注结果:
- **clarify user intent**:若用户意图不明确,评分该回复对澄清意图的帮助程度,取值为0/1/2/3。
- **showing empathetic**:若用户表达了情感或情绪,评分该回复展现共情的程度,取值为0/1/2/3。
- **satisfying explicit constraints**:若查询中存在明确约束,评分该回复对每项约束的满足程度,格式为`{约束描述} | 0/1/2/3`的列表。
- **correcting explicit mistakes or biases**:若查询中存在错误或偏见,分类该回复对每项错误/偏见的处理方式,格式为`{错误描述} | 指出并修正/指出但未修正/未指出即修正/既未指出也未修正`的列表。
- **supporting explicit subjective stances**:若查询中存在主观立场,分类该回复对每项立场的支持程度,格式为`{立场描述} | 强烈支持/轻微支持/中立/轻微反对/强烈反对`的列表。
## 统计信息
👇 满足5项查询特定前提条件的样本数量
| 前提条件 | 样本数 | 前提条件 | 样本数 |
| ------------------------- | ------ | ------------------------- | ------ |
| 包含明确约束 | 1,418 | 意图不明确 | 459 |
| 展现主观立场 | 388 | 表达情感 | 121 |
| 包含错误或偏见 | 401 | | |
👇 采集数据中各属性的平均得分/数量。*5项查询特定属性的平均得分仅基于满足对应前提条件的样本计算。
| 属性类别 | 平均得分/数量 | 属性类别 | 平均得分/数量 |
| ---------------------------- | ------------ | ---------------------------- | ------------ |
| **平均得分** | | | |
| 无害性 | 2.90 | 说服力 | 0.27 |
| 语法正确性 | 2.70 | 分步性 | 0.37 |
| 友好度 | 1.79 | 使用非正式表达 | 0.04 |
| 礼貌度 | 2.78 | 清晰度 | 2.54 |
| 交互性 | 0.22 | 信息丰富度 | 1.74 |
| 权威性 | 1.67 | 新颖性 | 0.47 |
| 趣味性 | 0.08 | 相关性 | 2.45 |
| 使用修辞手段 | 0.16 | 澄清意图* | 1.33 |
| 复杂句式与词汇 | 0.89 | 展现共情* | 1.48 |
| 使用辅助材料 | 0.13 | 满足约束* | 2.01 |
| 格式规范 | 1.26 | 支持立场* | 2.28 |
| 承认局限性或错误 | 0.17 | 修正错误* | 1.08 |
| **平均数量** | | | |
| 严重错误 | 0.59 | 轻微错误 | 0.23 |
| 中等错误 | 0.61 | 回复长度 | 164.52 |
👇 标注数据中各属性的相关性
<img src="./property_corr.PNG" alt="image-20240213145030747" style="zoom: 50%;" />
## 免责声明与使用条款
*本部分内容源自原始数据集*
- **本数据集包含可能被视为不安全、冒犯性或令人不适的对话内容。** 未经适当过滤处理的情况下,本数据集不得用于训练对话智能体。我们不对基于本数据集训练的模型所生成的任何输出负责。
- 本数据集中的陈述或观点并不代表参与数据收集工作的研究人员或机构的立场。
- 数据集使用者需负责确保本数据集的合理使用,包括遵守所有适用的法律法规。
- 使用者在使用特定模型的直接输出时,需遵守该模型的使用条款。
- 数据集使用者同意不尝试识别本数据集中的个人身份。
## 许可证
与原始数据集一致,本数据集采用CC-BY-NC-4.0许可证进行授权。
## 引用
@article{li2024dissecting,
title={Dissecting Human and LLM Preferences},
author={Li, Junlong and Zhou, Fan and Sun, Shichao and Zhang, Yikai and Zhao, Hai and Liu, Pengfei},
journal={arXiv preprint arXiv:2402.11296},
year={2024}
}
提供机构:
maas
创建时间:
2025-02-08



