kaist-ai/Multifaceted-Collection

Hugging Face2024-06-07 更新2024-06-15 收录

下载链接：

https://hf-mirror.com/datasets/kaist-ai/Multifaceted-Collection

下载链接

链接失效反馈

资源简介：

Multifaceted Collection是一个偏好数据集，用于对齐大型语言模型（LLMs）以适应多样的人类偏好。数据集包含65k个独特的指令，每个指令伴随3个表示不同偏好的系统消息和相应的响应，总计196k个实例。数据集的创建旨在解决现有对齐数据集的局限性，捕捉跨多个维度的细粒度偏好。数据集的主要字段包括`main_source`（指令的来源数据集）、`original_source`（指令的原始来源）、`preference_set`（偏好集，包含维度、子维度和具体偏好）、`system`（系统消息，详细说明要遵循的个体偏好）、`prompt`（指示特定任务的指令）和`output`（最佳响应，由GPT-4生成）。

Multifaceted Collection is a preference dataset designed for aligning Large Language Models (LLMs) with diverse human preferences. The dataset comprises 65k unique instructions, each paired with three system messages representing distinct preferences and their corresponding responses, totaling 196k instances. It was developed to address the limitations of existing alignment datasets and capture fine-grained preferences across multiple dimensions. The primary fields of the dataset include `main_source` (the source dataset of the instruction), `original_source` (the original source of the instruction), `preference_set` (the preference set, which encompasses dimensions, sub-dimensions and specific preferences), `system` (the system message detailing the individual preference to follow), `prompt` (the instruction specifying a particular task), and `output` (the optimal response generated by GPT-4).

提供机构：

kaist-ai

原始信息汇总

数据集卡片 for Multifaceted Collection

数据集详情

数据集概述

Multifaceted Collection 是一个用于对齐大型语言模型（LLMs）与多样化人类偏好的偏好数据集，其中系统消息用于表示个人偏好。该数据集包含65k条独特的指令，每条指令伴随3条表示不同偏好的系统消息及相应响应，总计196k个实例。

许可证: Creative Commons Attribution 4.0
相关数据集: Multifaceted-Collection-small, Multifaceted-Collection-DPO, Multifaceted-Collection-ORPO, Multifaceted-Collection-RM, Multifaceted-Bench
训练模型: Janus 7B

数据集结构

数据实例

以下是数据集中的一个示例实例：

json { "main_source": "OpenHermesPreferences", "original_source": "CamelAI", "preference_set": [ { "dimension": "style", "subdimension": "vividness", "preference": "richly descriptive", "description": "The preference leans towards using a richly descriptive style that vividly illustrates the scenario of the game, including sensory details and metaphors to make the theoretical concepts more tangible and engaging." }, { "dimension": "background_knowledge", "subdimension": "expert", "preference": "advanced mathematical concepts", "description": "This preference suggests tailoring responses to include advanced mathematical concepts and terminologies, assuming the user has a strong background in mathematics or game theory." }, { "dimension": "informativeness", "subdimension": "practicality", "preference": "application-focused", "description": "The preference is for responses that not only explain the expected value in the context of the given game but also extend the discussion to practical applications of these principles in real-life situations or other games." }, { "dimension": "harmlessness", "subdimension": "morality", "preference": "ethical considerations", "description": "A preference for integrating ethical considerations into the response, reflecting on the fairness and morality of using misleading tactics like the double-headed or double-tailed pennies in games or any competitive scenario." } ], "system": "Envision yourself as a masterful game theorist and strategist, weaving advanced mathematical concepts into the rich tapestry of game scenarios.", "prompt": "Two players, A and B, are playing a game of matching pennies.", "output": "Lets dissect the scenario with the artistry of game theory, crafting a vivid tableau from what seems a simple pas de deux of coin flips." }

数据字段

main_source (str): 指令的源数据集
original_source (str): 根据源数据集的指令原始来源
preference_set (List[Dict[str, str]]): 偏好集，构成系统消息的基础。每个偏好集包含四个高级维度（风格、背景知识、信息量和无害性）的偏好，按维度、子维度和特定偏好的顺序指定。
system (str): 系统消息，详细说明遵循个人多方面偏好的目标。
prompt (str): 指示特定任务的指令
output (str): 最佳遵循系统消息和指令的黄金响应，由 gpt-4-0125-preview 生成

数据集创建

数据集创建理由

Multifaceted Collection 数据集旨在解决现有对齐数据集的局限性，通过捕捉跨多个维度的细粒度偏好。我们将偏好概念化为一个详细的文本描述，说明一个理想的响应应具备的质量。

数据收集和处理

1. 指令采样

从五个高质量偏好数据集中选择指令：

2. 偏好集生成

我们最初确定了四个主要维度：风格、背景知识、信息量和无害性。然后定义了一个包含每个维度一个偏好的偏好集。

3. 系统消息和黄金响应生成

使用 GPT-4 Turbo 将每个偏好集转换为系统消息，并为每个系统消息生成黄金标准的多样化响应。

AI搜集汇总

数据集介绍

构建方式

Multifaceted Collection数据集的构建过程体现了对多样化人类偏好的精细捕捉。该数据集从五个高质量的偏好数据集中选取指令，经过去重和过滤后，形成了65,000条独特的指令。每条指令通过GPT-4-0125-preview生成三个不同的偏好集，每个偏好集涵盖风格、背景知识、信息量和无害性四个维度。随后，利用GPT-4 Turbo将这些偏好集转化为系统消息，并生成相应的黄金标准响应，确保每条指令都有三个系统消息和对应的响应，最终形成了196,000个实例。

使用方法

Multifaceted Collection数据集适用于监督微调（SFT）任务，特别适合用于训练语言模型以生成符合特定偏好和系统消息的响应。用户可以通过HuggingFace平台下载该数据集，并根据提供的系统消息和偏好集进行模型训练。数据集的结构清晰，包含了指令、系统消息、偏好集和对应的响应，用户可以根据需要选择不同的配置进行训练。此外，数据集还提供了详细的文档和示例，帮助用户更好地理解和使用数据集。

背景与挑战

背景概述

Multifaceted Collection数据集由韩国KAIST大学的研究人员创建，旨在解决现有对齐数据集在捕捉多维度细粒度偏好方面的局限性。该数据集的核心研究问题是如何通过系统消息来表达和训练语言模型以适应多样化的用户偏好。通过整合来自五个高质量偏好数据集的指令，研究人员生成了65,000条独特的指令，每条指令伴随3条不同的系统消息，代表了不同的偏好设置。该数据集的创建不仅提升了语言模型在生成符合特定偏好响应方面的能力，还为未来的语言模型对齐研究提供了重要的资源。

当前挑战

Multifaceted Collection数据集在构建过程中面临多项挑战。首先，如何从多个现有数据集中筛选和整合指令，确保指令的多样性和高质量，是一个复杂的过程。其次，生成多维度的偏好设置并将其转化为系统消息，需要精确的文本描述和生成技术，以确保模型能够理解和学习这些偏好。此外，数据集的规模和复杂性也带来了存储和处理上的挑战，尤其是在生成和存储大量系统消息和响应时。最后，如何确保数据集在不同应用场景下的通用性和有效性，也是一个需要深入研究的问题。

常用场景

经典使用场景

Multifaceted Collection数据集的经典使用场景主要集中在对大型语言模型（LLMs）进行监督微调（SFT），以使其能够更好地理解和遵循多样化的用户偏好。该数据集通过提供65,000条独特的指令，每条指令附带3条不同的系统消息，分别代表不同的偏好，并生成相应的参考答案。这种设计使得模型能够在多种偏好维度上进行训练，从而生成更符合用户期望的输出。

解决学术问题

Multifaceted Collection数据集解决了现有对齐数据集在捕捉细粒度偏好方面的局限性。通过引入多维度偏好（如风格、背景知识、信息量和无害性），该数据集帮助模型更好地理解并生成符合特定偏好的响应。这一创新不仅提升了模型的响应质量，还为研究者提供了一个强大的工具，用于探索和验证语言模型在复杂偏好情境下的表现。

实际应用

在实际应用中，Multifaceted Collection数据集可用于开发更智能、更人性化的对话系统。例如，在客户服务领域，该数据集可以帮助训练模型根据用户的语言风格和信息需求生成更合适的回复。此外，在教育领域，该数据集可用于构建能够根据学生学习风格和知识水平调整教学内容的智能辅导系统，从而提升用户体验和学习效果。

数据集最近研究