kaist-ai/Multifaceted-Collection-ORPO|语言模型训练数据集|偏好对齐数据集

hugging_face2024-07-01 更新2024-06-15 收录

语言模型训练

偏好对齐

下载链接：

https://hf-mirror.com/datasets/kaist-ai/Multifaceted-Collection-ORPO

下载链接

链接失效反馈

资源简介：

Multifaceted Collection ORPO是一个用于训练语言模型以生成符合多样化人类偏好的数据集。该数据集包含65k条独特的指令，每条指令对应一个系统消息，并选择符合该消息的响应作为chosen，选择不符合的响应作为rejected。数据集的结构包括多个字段，如main_source、original_source、system、prompt、chosen和rejected。数据集的创建过程包括指令采样、偏好集生成、系统消息和黄金响应生成等步骤。

Multifaceted Collection ORPO is a training dataset designed for aligning language models to diverse human preferences. It consists of 65k unique instructions, each associated with a system message, where the response aligned with the message is designated as chosen, and a response not aligned is designated as rejected. The dataset structure includes fields such as main_source, original_source, system, prompt, chosen, and rejected. The dataset creation process involves instruction sampling, preference set generation, system message and gold response generation, among other steps.

提供机构：

kaist-ai

原始信息汇总

数据集概述

数据集详情

数据集结构

数据字段

main_source (字符串): 指令的源数据集
original_source (字符串): 根据源数据集的指令原始来源
system (字符串): 系统消息，详细说明遵循个人多方面偏好的目标，涉及四个高层次的理想响应维度（风格、背景知识、信息量和无害性）
prompt (字符串): 指示特定任务的指令
chosen (列表[字典[字符串, 字符串]]): 最佳遵循系统消息和指令的黄金响应，由 gpt-4-0125-preview 生成。格式化为对话，即 [{"content": ..., "role": "user"}, {"content": ..., "role": "assistant"}]
rejected (列表[字典[字符串, 字符串]]): 最佳遵循不同系统消息和指令的黄金响应，由 gpt-4-0125-preview 生成。格式与 chosen 相同。

数据实例

以下是数据集中的一个示例实例：

json { "main_source": "OpenHermesPreferences", "original_source": "OpenHermesPreferences", "system": "You are a persuasive environmental advocate, tasked with enlightening individuals on the intricacies of climate change, making a compelling case for urgent collective action. Equipped with the assumption that your audience possesses a foundational, yet incomplete, understanding of climate change, you bridge knowledge gaps without burdening them with excessive complexity. Your narrative weaves together a concise yet comprehensive overview of climate change, including its causes, consequences, and humanitys role within this global issue, ensuring a balance between depth and brevity. With a keen sensitivity to environmental justice, you underscore how the ramifications of climate change disproportionately affect the most vulnerable communities, advocating for an inclusive approach that recognizes the interplay between environmental challenges and societal inequities. Your goal is to not only inform but also mobilize, harnessing the power of persuasive communication to foster a sense of responsibility and immediate action towards mitigating the effects of climate change.", "prompt": "Edit the given paragraph for grammar and style, making it more engaging and concise. Climate changes rapidly in response to alterations in Earths atmosphere. The activities of humans plays a large role in atmosphere changes, and thus leads to global warming. More particularly, the release in huge amounts of CO2 by burning fossil fuels, industrial processes, and deforestation are some of the main factors. The adverse consequences of global warming are numerous, affecting the environment and human life, including devastating weather events, ocean levels rising, and negatively impacting the habitat of many species.", "chosen": [ { "content": "Edit the given paragraph for grammar and style, making it more engaging and concise. Climate changes rapidly in response to alterations in Earths atmosphere. The activities of humans plays a large role in atmosphere changes, and thus leads to global warming. More particularly, the release in huge amounts of CO2 by burning fossil fuels, industrial processes, and deforestation are some of the main factors. The adverse consequences of global warming are numerous, affecting the environment and human life, including devastating weather events, ocean levels rising, and negatively impacting the habitat of many species.", "role": "user" }, { "content": "Climate change accelerates with shifts in Earths atmosphere, significantly driven by human activities. The burning of fossil fuels, industrial operations, and deforestation unleash vast amounts of CO2, propelling global warming. This warming leads to dire consequences for both the environment and humanity: devastating weather phenomena, rising ocean levels, and the jeopardization of numerous species habitats.", "role": "assistant" } ], "rejected": [ { "content": "Edit the given paragraph for grammar and style, making it more engaging and concise. Climate changes rapidly in response to alterations in Earths atmosphere. The activities of humans plays a large role in atmosphere changes, and thus leads to global warming. More particularly, the release in huge amounts of CO2 by burning fossil fuels, industrial processes, and deforestation are some of the main factors. The adverse consequences of global warming are numerous, affecting the environment and human life, including devastating weather events, ocean levels rising, and negatively impacting the habitat of many species.", "role": "user" }, { "content": "Climate swiftly shifts in response to changes in Earths atmosphere, with human activities at the heart of these transformations, fueling global warming. Notably, the massive release of CO2 through burning fossil fuels, industrial endeavors, and deforestation stands as a principal culprit. The fallout of global warming is vast, imperiling both the environment and human well-being. It manifests in catastrophic weather events, rising ocean levels, and the disruption of habitats for countless species.", "role": "assistant" } ] }

数据集创建

数据收集和处理

指令采样：
- 从五个高质量偏好数据集中选择指令：
- 移除完全重复项，并使用正则表达式模式过滤具有预先存在系统消息的指令，最终得到65k条指令。
偏好集生成：
- 初始识别四个主要响应偏好维度：风格、背景知识、信息量和无害性。
- 定义一个包含每个维度一个偏好的偏好集。
- 对于每个65k条指令，使用 gpt-4-0125-preview 生成3个不同的任务对齐偏好集。
系统消息和黄金响应生成：
- 使用GPT-4 Turbo将每个偏好集转换为系统消息，每个指令生成三个系统消息。
- 再次使用GPT-4 Turbo为每个系统消息制作黄金标准的多方面响应。

数据集摘要

Multifaceted Collection ORPO是一个适用于几率比偏好优化（ORPO）的训练数据集。该数据集包含65k条独特的指令，每个指令选择一个系统消息，并将对齐的响应标记为“chosen”，同时从其余两个非对齐系统消息中选择一个响应标记为“rejected”。

许可证：Creative Commons Attribution 4.0
相关数据集：Multifaceted-Collection, Multifaceted-Collection-DPO, Multifaceted-Collection-RM, Multifaceted-Bench
训练模型：Janus-ORPO-7B

数据集信息

特征：
- main_source (字符串)
- original_source (字符串)
- system (字符串)
- prompt (字符串)
- chosen (列表)
- rejected (列表)
分割：
- train：507621991字节，64641个样本
下载大小：285644717字节
数据集大小：507621991字节
配置：
- default：数据文件路径为 data/train-*
许可证：cc-by-4.0
任务类别：text-generation
大小类别：10K<n<100K

AI搜集汇总

数据集介绍

构建方式

Multifaceted Collection ORPO数据集的构建基于五个高质量的偏好数据集，包括Nectar、OpenHermesPreferences、UltraFeedback-binarized-clean、Chatbot Arena Conversations和Domain-Specific Preference dataset (DSP)。通过去除重复指令并使用正则表达式过滤，最终筛选出65,000条指令。每条指令通过GPT-4-Turbo-0125生成三条不同的系统消息，并进一步生成相应的黄金标准响应，形成'chosen'和'rejected'两类响应，以优化奇偶比偏好优化（ORPO）。

特点

该数据集的特点在于其多维度偏好捕捉能力，涵盖风格、背景知识、信息量和无害性四个主要维度。通过详细系统消息的引入，确保模型能够学习到细微的偏好差异。此外，数据集的构建过程中采用了分层偏好增强策略，从一般维度细化到具体子维度和偏好，增强了数据集的多样性和复杂性。

使用方法

Multifaceted Collection ORPO数据集适用于文本生成任务，特别是需要根据特定偏好生成响应的场景。用户可以通过HuggingFace平台下载该数据集，并利用其中的系统消息和黄金标准响应进行模型训练和评估。数据集的结构设计便于直接应用于现有的自然语言处理框架，支持对话生成、文本编辑等多种应用。

背景与挑战

背景概述

在自然语言处理领域，大型语言模型（LLMs）的性能提升依赖于高质量的数据集。Multifaceted Collection ORPO数据集由韩国科学技术院（KAIST）的AI实验室创建，旨在通过多样化的偏好数据集来优化LLMs的性能。该数据集的核心研究问题是如何使LLMs更好地理解和遵循人类的多样化偏好。通过整合来自五个现有数据集的指令，并使用GPT-4-Turbo-0125生成系统消息和参考答案，研究人员展示了该数据集在捕捉多种偏好方面的有效性。该数据集的创建不仅提升了LLMs在文本生成任务中的表现，还为相关领域的研究提供了宝贵的资源。

当前挑战

Multifaceted Collection ORPO数据集在构建过程中面临多个挑战。首先，整合来自多个数据集的指令并确保其多样性和高质量是一个复杂的过程。其次，生成与系统消息高度一致的参考答案需要先进的语言模型技术，如GPT-4-Turbo-0125，这增加了数据集构建的技术难度和成本。此外，确保数据集在不同偏好维度上的平衡性，以及避免潜在的偏见和风险，也是该数据集面临的重要挑战。这些挑战不仅影响了数据集的质量，也对其在实际应用中的效果提出了考验。

常用场景

经典使用场景

在自然语言处理领域，Multifaceted-Collection-ORPO数据集的经典使用场景主要集中在大型语言模型（LLMs）的偏好对齐任务中。该数据集通过系统消息和参考答案的生成，帮助模型理解和适应多样化的用户偏好。具体应用包括但不限于：通过对比‘chosen’和‘rejected’响应，训练模型生成更符合特定偏好和系统指令的文本；以及在多维度偏好分析中，评估和优化模型的响应质量。

实际应用

在实际应用中，Multifaceted-Collection-ORPO数据集被广泛用于开发和优化面向用户的对话系统和服务。例如，在客户服务聊天机器人中，该数据集帮助模型生成更符合用户期望和公司品牌形象的回复；在教育领域的智能助手中，通过理解和适应学生的学习风格和偏好，提供个性化的学习支持。这些应用显著提升了用户体验和服务效率。

衍生相关工作

基于Multifaceted-Collection-ORPO数据集，研究者们开发了多种相关工作，包括但不限于：Janus-ORPO-7B模型，该模型通过训练能够更好地理解和生成符合多维度偏好的文本；以及Multifaceted-Bench，一个用于评估模型在多维度偏好对齐任务中表现的基准测试。这些工作不仅扩展了数据集的应用范围，还为后续研究提供了丰富的实验平台和方法论支持。

以上内容由AI搜集并总结生成

用户留言

有没有相关的论文或文献参考？

这个数据集是基于什么背景创建的？

数据集的作者是谁？

能帮我联系到这个数据集的作者吗？

这个数据集如何下载？

点击留言

数据主题

具身智能

数据集 4099个

机构 8个

大模型

数据集 439个

机构 10个

无人机

数据集 37个

机构 6个

指令微调

数据集 36个

机构 6个

蛋白质结构

数据集 50个

机构 8个

空间智能

数据集 21个

机构 5个

5,000+

优质数据集

54 个

任务类型

进入经典数据集

热门数据集

中国区域交通网络数据集

该数据集包含中国各区域的交通网络信息，包括道路、铁路、航空和水路等多种交通方式的网络结构和连接关系。数据集详细记录了各交通节点的位置、交通线路的类型、长度、容量以及相关的交通流量信息。

data.stats.gov.cn 收录

PlantVillage

在这个数据集中，39 种不同类别的植物叶子和背景图像可用。包含 61,486 张图像的数据集。我们使用了六种不同的增强技术来增加数据集的大小。这些技术是图像翻转、伽玛校正、噪声注入、PCA 颜色增强、旋转和缩放。

OpenDataLab 收录

糖尿病预测数据集

糖尿病相关的医学研究或者健康数据

AI_Studio 收录

THCHS-30

“THCHS30是由清华大学语音与语言技术中心（CSLT）发布的开放式汉语语音数据库。原始录音是2002年在清华大学国家重点实验室的朱晓燕教授的指导下，由王东完成的。清华大学计算机科学系智能与系统，原名“TCMSD”，意思是“清华连续普通话语音数据库”，时隔13年出版，由王东博士发起，并得到了教授的支持。朱小燕。我们希望为语音识别领域的新研究人员提供一个玩具数据库。因此，该数据库对学术用户完全免费。整个软件包包含建立中文语音识别所需的全套语音和语言资源系统。”

OpenDataLab 收录

中国逐日格点降水数据集V2（1960–2024，0.1°）

CHM_PRE V2数据集是一套高精度的中国大陆逐日格点降水数据集。该数据集基于1960年至今共3476个观测站的长期日降水观测数据，并纳入11个降水相关变量，用于表征降水的相关性。数据集采用改进的反距离加权方法，并结合基于机器学习的LGBM算法构建。CHM_PRE V2与现有的格点降水数据集（包括CHM_PRE V1、GSMaP、IMERG、PERSIANN-CDR和GLDAS）表现出良好的时空一致性。数据集基于63,397个高密度自动雨量站2015–2019年的观测数据进行验证，发现该数据集显著提高了降水测量精度，降低了降水事件的高估，为水文建模和气候评估提供了可靠的基础。CHM_PRE V2 数据集提供分辨率为0.1°的逐日降水数据，覆盖整个中国大陆（18°N–54°N，72°E–136°E）。该数据集涵盖1960–2024年，并将每年持续更新。日值数据以NetCDF格式提供，为了方便用户，我们还提供NetCDF和GeoTIFF格式的年度和月度总降水数据。

国家青藏高原科学数据中心收录