HannahRoseKirk/prism-alignment

Name: HannahRoseKirk/prism-alignment
Creator: HannahRoseKirk
Published: 2024-04-25 09:11:35
License: 暂无描述

Hugging Face2024-04-25 更新2024-05-25 收录

下载链接：

https://hf-mirror.com/datasets/HannahRoseKirk/prism-alignment

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc language: - en tags: - alignment - human-feedback - ratings - preferences - ai-safety - llm - survey - fine-grained pretty_name: The PRISM Alignment Dataset size_categories: - 10K<n<100K configs: - config_name: survey data_files: "survey.jsonl" - config_name: conversations data_files: "conversations.jsonl" - config_name: utterances data_files: "utterances.jsonl" - config_name: metadata data_files: "metadata.jsonl" --- # Dataset Card for PRISM PRISM is a diverse human feedback dataset for preference and value alignment in Large Language Models (LLMs). It maps the characteristics and stated preferences of humans from a detailed survey onto their real-time interactions with LLMs and contextual preference ratings ## Dataset Details There are two sequential stages: first, participants complete a **Survey** where they answer questions about their demographics and stated preferences, then proceed to the **Conversations** with LLMs, where they input prompts, rate responses and give fine-grained feedback in a series of multi-turn interactions. We survey 1500 participants born in 75 countries, residing in 38 countries. The majority of these participants progress to the conversations phase (1,396, 93%). At the beginning of the conversation, a participant chooses from three conversation types: _Unguided_, _Values guided_ or _Controversy guided_. They then construct an opening prompt of their choosing. We include 21 different LLMs in the backend of our interface (with a mix of open-access and commerical API models). Four of these LLMs are selected at random for the opening turn of the conversations. The participant rates the model responses on a sliding cardinal scale between Terrible (1) and Perfect (100). The conversation continues with the highest-rated LLM in subsequent turns of human prompts and model responses (between 2-22 turns). After the first turn, the same model responds with an A and B response to the human prompt (sampled at a non-deterministic temperature). After the participant ends the conversation, they give fine-grained feedback on model performance, why they rated one model higher than the others avaliable, and natural language open-ended feedback on the conversation as a whole. Each participant is asked to have six conversations in total, equally split acorss conversation types (but there is some deviation from this quota). In total, there are 8,011 conversation trees, and 68,371 rated utterances (human prompt - model response - score 3-tuple). For more information on the dataset, please see our paper [The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models](https://arxiv.org/abs/2404.16019) or the [Codebook](https://github.com/HannahKirk/prism-alignment/blob/main/prism_codebook.pdf) on our Github. - **Curated by:** This project was primarily conducted and recieved ethics approval via the University of Oxford. The project was assisted by researchers at other various academic and industry institutions. - **Funded by:** This project was awarded the MetaAI Dynabench Grant "Optimising feedback between humans-and-models-in-the-loop". For additional compute support, the project was awarded the Microsoft Azure Accelerating Foundation Model Research Grant. For additional annotation support, we received funding from the OpenPhil grant and NSF grant (IIS-2340345) via New York University. We also received in the form of research access or credits from OpenAI, Anthropic, Aleph Alpha, Google, HuggingFace and Cohere. - **Language(s):** The majority of the dataset is in English (99%) due to task and crowdworker specifications. - **License:** Human-written texts (including prompts) within the dataset are licensed under the Creative Commons Attribution 4.0 International License (CC-BY-4.0). Model responses are licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC-BY-NC-4.0). Use of model responses must abide by the original model provider licenses. ### Dataset Sources - **Repository:** https://github.com/HannahKirk/prism-alignment - **Paper:** https://arxiv.org/abs/2404.16019 - **Website:** https://hannahkirk.github.io/prism-alignment/ ## Dataset Structure We release two primary jsonl files for our dataset. All variables are documented and explained in our [Code Book](https://github.com/HannahKirk/prism-alignment/blob/main/prism_codebook.pdf). 1. **The Survey** (`survey.jsonl`): The survey where users answer questions such as their stated preferences for LLM behaviours, their familarity with LLMs, a self-description and some basic demographics. Each row is a single user in our dataset, identified by a `user_id`. 2. **The Conversations** (`conversations.jsonl`): Each participants' multiple conversation trees with LLMs and associated feedback. Each row is a single conversation, identified by a `conversation_id`, that can be matched back to a participant's survey profile via the `user_id`. The conversation itself is stored as a list of dictionaries representing human and model turns in the `conversation_history` column, which broadly follows the format of widely used Chat APIs. We appreciate that different analyses require different data formats. In order to save people time in wrangling the conversational data into different formats we also present a long-format. Please note this contains the same data, just presented differently. 3. **The Utterances** (`utterances.jsonl`): Each row is a single scored utterance (human input - model response - score). Each row has an `utterance_id` that can be mapped back to the conversation data using `conversation_id` or the survey using `user_id`. The model responses and scores per each user input are in _long format_. Because of this format, the user inputs will be repeated for the set of model responses in a single interaction turn. Finally, for every text instance in PRISM, we provide: 4. **The Metadata** (`metadata.jsonl`): Each row is a text instance with attached information on language detection, personal or private information (PII) detection and moderation flags. ## Terms of Use ### Purpose The Dataset is provided for the purpose of research and educational use in the field of natural language processing, conversational agents, social science and related areas; and can be used to develop or evaluate artificial intelligence, including Large Language Models (LLMs). ### Usage Restrictions Users of the Dataset should adhere to the terms of use for a specific model when using its generated responses. This includes respecting any limitations or use case prohibitions set forth by the original model's creators or licensors. ### Content Warning The Dataset contains raw conversations that may include content considered unsafe or offensive. Users must apply appropriate filtering and moderation measures when using this Dataset for training purposes to ensure the generated outputs align with ethical and safety standards. ### No Endorsement of Content The conversations and data within this Dataset do not reflect the views or opinions of the Dataset creators, funders or any affiliated institutions. The dataset is provided as a neutral resource for research and should not be construed as endorsing any specific viewpoints. ### No Deanonymisation The User agrees not to attempt to re-identify or de-anonymise any individuals or entities represented in the Dataset. This includes, but is not limited to, using any information within the Dataset or triangulating other data sources to infer personal identities or sensitive information. ### Limitation of Liability The authors and funders of this Dataset will not be liable for any claims, damages, or other liabilities arising from the use of the dataset, including but not limited to the misuse, interpretation, or reliance on any data contained within. ## Data Statement We provide a full data statement in [our paper](https://arxiv.org/abs/2404.16019). There, we have detailed breakdowns of participant demographics and geographic information. ## Citation **BibTeX:** ``` @misc{kirk2024PRISM, title={The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models}, author={Hannah Rose Kirk and Alexander Whitefield and Paul Röttger and Andrew Bean and Katerina Margatina and Juan Ciro and Rafael Mosquera and Max Bartolo and Adina Williams and He He and Bertie Vidgen and Scott A. Hale}, year={2024}, url = {http://arxiv.org/abs/2404.16019}, eprint={2404.16019}, archivePrefix={arXiv}, primaryClass={cs.CL} } ``` **Suggested in-text citation (paper)**: Kirk, H. R., Whitefield, A., Röttger, P., Bean, A., Margatina, K., Ciro, J., Mosquera, R., Bartolo, M., Williams, A., He, H., Vidgen, B., & Hale, S. A. (2024). _The PRISM Alignment Project: What Participatory, Representative and Individualised human feedback reveals about the Subjective and Multicultural alignment of Large Language Models._ arXiv preprint arXiv:xxxx.xxxxx. The dataset also has a DOI and a citation: ``` @misc{kirk2024PRISMdataset, author = {Kirk, Hannah Rose and Whitefield, Alexander and Röttger, Paul and Bean, Andrew and Margatina, Katerina and Ciro, Juan and Mosquera, Rafael and Bartolo, Max and Williams, Adina and He, He and Vidgen, Bertie and Hale, Scott A.}, title = {The PRISM Alignment Dataset}, year = {2024}, url = {https://huggingface.co/datasets/HannahRoseKirk/prism-alignment}, doi = {10.57967/hf/2113}, publisher = {Hugging Face} } ``` **Suggested in-text citation (dataset)**: Kirk, H. R., Whitefield, A., Röttger, P., Bean, A., Margatina, K., Ciro, J., Mosquera, R., Bartolo, M., Williams, A., He, H., Vidgen, B., & Hale, S. A. (2024). _The PRISM Alignment Dataset._ https://doi.org/10.57967/hf/2113. ## Dataset Card Authors Hannah Rose Kirk (hannah.kirk@oii.ox.ac.uk) ## Issue Reporting If there are any issues with the dataset, for example, the discovery of personal or private information (PII) or requests from participants for data removal, please report it to us via the [Issue Reporting Form](https://forms.gle/WFvqvDBNqbCteoZV7).

提供机构：

HannahRoseKirk

原始信息汇总

数据集概述

名称: The PRISM Alignment Dataset

描述: PRISM是一个多样化的人类反馈数据集，用于大型语言模型（LLMs）中的偏好和价值对齐。该数据集通过详细调查将人类的特点和声明偏好映射到与LLMs的实时交互和上下文偏好评分上。

语言: 主要为英语（99%）。

许可: 人类编写的文本（包括提示）根据Creative Commons Attribution 4.0 International License (CC-BY-4.0)授权。模型响应根据Creative Commons Attribution-NonCommercial 4.0 International License (CC-BY-NC-4.0)授权。使用模型响应必须遵守原始模型提供者的许可。

数据集结构

调查数据 (survey.jsonl): 包含用户对LLM行为偏好的回答、对LLM的熟悉度、自我描述和基本人口统计信息。每行代表一个用户，通过user_id标识。
对话数据 (conversations.jsonl): 包含参与者与LLMs的多轮交互和相关反馈。每行代表一个对话，通过conversation_id标识，可与用户的调查配置文件通过user_id匹配。
语音数据 (utterances.jsonl): 每行代表一个评分语音（人类输入 - 模型响应 - 评分）。每行通过utterance_id标识，可与对话数据通过conversation_id或调查通过user_id匹配。
元数据 (metadata.jsonl): 每行包含PRISM中每个文本实例的语言检测、个人或私人信息（PII）检测和审核标志。

数据集使用条款

目的: 用于自然语言处理、对话代理、社会科学及相关领域的研究和教育用途。
使用限制: 使用数据集时，应遵守特定模型的使用条款，包括尊重任何限制或使用案例禁令。
内容警告: 数据集包含可能被认为不安全或冒犯的内容，用户在使用此数据集进行训练时应采取适当的过滤和审核措施。
无内容背书: 数据集中的对话和数据不代表数据集创建者、资助者或任何附属机构的观点或意见。
无去匿名化: 用户同意不尝试重新识别或去匿名化数据集中的任何个人或实体。
责任限制: 数据集的作者和资助者不对数据集的使用引起的任何索赔、损害或其他责任负责。

数据集来源

存储库: https://github.com/HannahKirk/prism-alignment
论文: https://arxiv.org/abs/2404.16019
网站: https://hannahkirk.github.io/prism-alignment/

引用信息

论文:

@misc{kirk2024PRISM, title={The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models}, author={Hannah Rose Kirk and Alexander Whitefield and Paul Röttger and Andrew Bean and Katerina Margatina and Juan Ciro and Rafael Mosquera and Max Bartolo and Adina Williams and He He and Bertie Vidgen and Scott A. Hale}, year={2024}, url = {http://arxiv.org/abs/2404.16019}, eprint={2404.16019}, archivePrefix={arXiv}, primaryClass={cs.CL} }

数据集:

@misc{kirk2024PRISMdataset, author = {Kirk, Hannah Rose and Whitefield, Alexander and Röttger, Paul and Bean, Andrew and Margatina, Katerina and Ciro, Juan and Mosquera, Rafael and Bartolo, Max and Williams, Adina and He, He and Vidgen, Bertie and Hale, Scott A.}, title = {The PRISM Alignment Dataset}, year = {2024}, url = {https://huggingface.co/datasets/HannahRoseKirk/prism-alignment}, doi = {10.57967/hf/2113}, publisher = {Hugging Face} }

搜集汇总

数据集介绍

构建方式

在大型语言模型对齐研究领域，PRISM数据集的构建体现了严谨的参与式设计理念。其构建过程分为两个有序阶段：首先，来自75个出生国、38个居住国的1500名参与者完成了一项详尽的背景调查，内容涵盖人口统计学特征与主观偏好陈述。随后，其中1396名参与者进入与模型的对话阶段，他们从非引导、价值观引导或争议引导三种对话类型中选择其一，并自主构建初始提示。研究后端集成了21种不同的开源与商业大语言模型，在对话首轮随机呈现四种模型的响应供参与者通过滑动量表（1-100分）进行评分，后续轮次则锁定评分最高的模型进行多轮交互。对话结束后，参与者还需提供细粒度的性能反馈与开放式评价，最终形成了包含8,011个对话树和68,371条评分话语的丰富语料。

使用方法

为便于学术研究，数据集以多个JSON Lines文件形式发布，主要包含调查数据与对话数据。研究者可通过`user_id`将参与者的调查档案与其发起的多个对话树关联，每个对话的历史记录以符合通用聊天API格式的字典列表存储于`conversation_history`列中。对于专注于单轮交互分析的研究，可直接使用`utterances.jsonl`文件，其中每条记录对应一个包含人类提示、模型响应及评分的三元组。所有文本实例均配有详细的元数据，研究者在使用前应仔细查阅随附的代码手册以理解各变量含义，并严格遵守数据使用条款，特别是关于模型响应许可及禁止去匿名化的规定。

背景与挑战

背景概述

在大型语言模型（LLM）的快速发展浪潮中，如何确保模型与人类价值观对齐已成为人工智能安全领域的核心议题。PRISM对齐数据集于2024年由牛津大学主导，联合多所学术与产业机构的研究团队共同创建，旨在通过参与式、代表性和个体化的反馈机制，深入探究模型行为与人类主观偏好及多元文化背景之间的复杂关系。该数据集通过大规模调查与多轮对话交互，首次将用户的详细人口统计特征与陈述性偏好映射至其实时模型交互行为上，为理解LLM的价值观对齐提供了前所未有的细粒度实证基础，对推动人工智能的伦理对齐与安全部署具有深远影响。

当前挑战

PRISM数据集致力于解决大型语言模型价值观对齐这一复杂问题，其核心挑战在于如何有效捕捉并量化人类主观偏好与文化多样性对模型评价的影响。在构建过程中，研究团队面临多重困难：首先，需设计能够平衡参与广度与深度的实验框架，确保来自75个国家、38个居住地的1500名参与者的反馈既具代表性又能反映个体差异；其次，协调21种不同LLM的后端集成与随机分配机制，并在多轮对话中实现动态模型切换与精细评分，技术实现复杂度极高；此外，处理大规模对话树（8,011棵）与评分话语（68,371条）所涉及的数据结构化、隐私保护与多格式输出转换，亦对数据工程的严谨性提出了严峻考验。

常用场景

经典使用场景

在大型语言模型对齐研究领域，PRISM数据集为探索人类偏好与模型行为之间的复杂映射关系提供了关键资源。该数据集通过整合详尽的用户调查与实时对话交互，构建了包含用户人口统计特征、陈述性偏好及多轮对话评分的结构化反馈。研究者可借助其精细标注的对话树与评分三元组，深入分析不同文化背景、个体差异对模型响应评价的影响机制，从而为模型对齐策略的优化奠定实证基础。

解决学术问题

PRISM数据集有效应对了大型语言模型对齐研究中主观性与文化多样性表征不足的学术挑战。通过覆盖75个国家出生、38个国家居住的多样化参与者群体，该数据集突破了传统对齐数据在样本代表性上的局限，为量化分析模型响应在不同人口统计学维度上的接受度差异提供了可能。其多层次反馈结构——从宏观价值观调查到微观对话评分——使得研究者能够系统探究个体偏好如何动态塑造对齐标准，进而推动对齐理论从单一规范向多元包容范式演进。

实际应用

在实际应用层面，PRISM数据集为开发更具适应性与安全性的对话系统提供了校准基准。企业与研究机构可依据其细粒度评分数据，训练或评估模型在不同场景下的表现稳定性，特别是在涉及价值观引导或争议性话题的对话中优化响应策略。该数据集还可用于构建个性化对话代理，通过模拟多样化用户交互模式，提升模型在跨文化语境中的实用性与可接受度，从而降低部署风险并增强用户体验。

数据集最近研究