rpr
收藏魔搭社区2025-12-05 更新2025-07-26 收录
下载链接:
https://modelscope.cn/datasets/microsoft/rpr
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for the Reasonable Preference Reversal (RPR) Dataset ([Paper](https://arxiv.org/abs/2407.14916))
The RPR dataset is a synthetic conditioned-conditioned preference dataset, which consists of tuples of prompt, context (either a criteria or scenario), and preference judgments. The primary objective of this dataset is to facilitate research and development in natural language processing (NLP) tasks, particularly in developing context-aware preference and reward models.
## Dataset Details
### Dataset Description
The RPR dataset is a synthetic conditioned-conditioned preference dataset, which includes over 20000 paired tuples of prompt, context (either a criteria or scenario), and preference judgments. The samples are paired so that preference between two completions for the same prompt is entirely ambiguous without context: for every context, there is an alternative context for which preference reverses. This design choice ensures that preference prediction performance on this dataset is determined solely by the model’s ability to pay attention to and interpret the context.
See the [paper](https://arxiv.org/abs/2407.14916) (Section 4 and Appendix B) for additional dataset details, including the motivation and the prompts used to synthesize the dataset.
- **Curated by:** Silviu Pitis, Ziang Xiao, Nicolas Le Roux, and Alessandro Sordoni (Microsoft Research Montreal)
- **Language(s) (NLP):** English
## Uses
### Direct Use
The dataset can be used for training and evaluating context-aware preference models, particularly in tasks requiring context understanding and preference determination. It offers a controlled environment for experimenting with preference modeling.
## Dataset Structure
Each row includes fields (prompt, response_a, response_b, criteria_x, criteria_y, scenario_x, scenario_y). `response_a` should be preferred to `response_b` when the context is `criteria_x` or `scenario_x`. Conversely, `response_b` should be preferred given `criteria_y` or `scenario_y`.
## Dataset Creation
See paper Appendix B.
## Bias, Risks, and Limitations
- The dataset is primarily English language
- Synthetic data may not exhibit the same richness and diversity as real-world data.
- The preferences indicated are based on pre-defined criteria and may not align with all potential user perspectives.
### Recommendations
- Users should be aware that the performance of systems trained on synthetic data may differ when deployed in real-world scenarios.
## Citation
**BibTeX:**
```
@misc{pitis2024improvingcontextawarepreferencemodeling,
title={Improving Context-Aware Preference Modeling for Language Models},
author={Silviu Pitis and Ziang Xiao and Nicolas Le Roux and Alessandro Sordoni},
year={2024},
eprint={2407.14916},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2407.14916},
}
```
## Dataset Card Contact
Silviu Pitis (silviu.pitis@gmail.com)
# 合理偏好反转(Reasonable Preference Reversal, RPR)数据集卡片([论文](https://arxiv.org/abs/2407.14916))
RPR数据集是一类合成的条件-条件偏好数据集,由提示词(prompt)、上下文(context,可为准则或场景)以及偏好判断组成的元组构成。本数据集的核心目标是推动自然语言处理(Natural Language Processing, NLP)相关任务的研究与开发,尤其聚焦于感知上下文的偏好模型与奖励模型的构建。
## 数据集详情
### 数据集描述
RPR数据集是一类合成的条件-条件偏好数据集,包含超过20000个配对的提示词(prompt)、上下文(context,可为准则或场景)以及偏好判断元组。所有样本均采用配对形式设计,使得在无上下文的前提下,同一提示词对应的两个补全之间的偏好关系完全模糊:对于每一个上下文,均存在一个与之相对的替代上下文,会导致偏好发生反转。这一设计确保了在此数据集上的偏好预测性能,仅由模型关注并解读上下文的能力决定。
如需了解更多数据集细节(包括构建动机与合成数据集所用的提示词),请参阅[论文](https://arxiv.org/abs/2407.14916)的第4章节与附录B。
- **整理方:** Silviu Pitis、Ziang Xiao、Nicolas Le Roux及Alessandro Sordoni(微软研究院蒙特利尔分部,Microsoft Research Montreal)
- **语言(自然语言处理):** 英语
## 用途
### 直接用途
本数据集可用于训练与评估感知上下文的偏好模型,尤其适用于需要上下文理解与偏好判定的任务。其为偏好建模相关实验提供了可控的研究环境。
## 数据集结构
每一行数据包含以下字段:提示词(prompt)、回复A(response_a)、回复B(response_b)、准则X(criteria_x)、准则Y(criteria_y)、场景X(scenario_x)、场景Y(scenario_y)。当上下文为准则X(criteria_x)或场景X(scenario_x)时,应优先选择回复A(response_a)而非回复B(response_b);反之,当上下文为准则Y(criteria_y)或场景Y(scenario_y)时,则应优先选择回复B(response_b)。
## 数据集构建
详见论文附录B。
## 偏差、风险与局限性
- 本数据集主要采用英语语言
- 合成数据可能无法展现真实世界数据的丰富性与多样性
- 所标注的偏好基于预定义准则,未必符合所有潜在用户的视角
### 建议
- 用户应注意:基于合成数据训练的系统,在真实世界场景中部署时的性能可能存在差异。
## 引用
**BibTeX格式:**
@misc{pitis2024improvingcontextawarepreferencemodeling,
title={Improving Context-Aware Preference Modeling for Language Models},
author={Silviu Pitis and Ziang Xiao and Nicolas Le Roux and Alessandro Sordoni},
year={2024},
eprint={2407.14916},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2407.14916},
}
## 数据集卡片联系人
Silviu Pitis(silviu.pitis@gmail.com)
提供机构:
maas
创建时间:
2025-07-22



