THUDM/BPO

Name: THUDM/BPO
Creator: THUDM
Published: 2023-11-20 11:49:55
License: 暂无描述

Hugging Face2023-11-20 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/THUDM/BPO

下载链接

链接失效反馈

官方服务：

资源简介：

为了推动语言模型对齐的发展，我们引入了一种黑箱对齐方法。BPO通过仅使用即插即用模型，增强了各种大型语言模型（LLMs）与人类偏好的对齐度。为了进一步从提示角度促进对齐工作，我们发布了BPO数据集。该数据集包含14,395条提示优化对，使用OpenAI的`gpt-3.5-turbo`引擎构建，并通过仔细筛选和校正确保数据的质量和多样性。

To advance the development of language model alignment, we introduce a black-box alignment method, namely BPO. BPO enhances the alignment between various large language models (LLMs) and human preferences by solely utilizing plug-and-play models. To further facilitate alignment research from the prompt perspective, we release the BPO dataset. This dataset contains 14,395 prompt optimization pairs constructed using OpenAI's `gpt-3.5-turbo` engine, with careful screening and verification to ensure the quality and diversity of the data.

提供机构：

THUDM

原始信息汇总

数据集卡片 for Black-box Prompt Optimization (BPO)

数据集描述

数据概述

为了推动语言模型与人类偏好的一致性发展，我们引入了一种黑盒对齐方法。BPO通过仅使用即插即用模型来增强各种大型语言模型（LLMs）与人类偏好的一致性。为了进一步从提示角度推动对齐工作，我们发布了BPO数据集。该数据集包含14,395条提示优化对，使用OpenAI的gpt-3.5-turbo引擎基于开源反馈数据构建。我们通过仔细的筛选和校正，确保了数据的质量和多样性。

支持的任务

BPO数据集设计用于提示优化/工程任务。由于我们仔细筛选和整理了数据集，它也是SFT或RLHF的良好选择。

语言

BPO数据集中的数据为英语。

数据结构

数据实例

一个"train"示例如下： json { "prompt": "Minimize the following data set [3, 4, 6, 7, 8, 10]", "optimized_prompt": "Provide a step-by-step solution to minimize the following data set: [3, 4, 6, 7, 8, 10].", "good_res": "To minimize the values in the given data set, which is a list of numbers, you could sort them in ascending order as follows: [3, 4, 6, 7, 8, 10]. The minimized (least) value in this data set would be 3.", "bad_res": "[3, 4, 6, 7, 8, 10]" }

数据字段

数据字段如下：

prompt: 原始用户输入
optimized_prompt: 基于反馈优化的提示
good_res: 人类选择的（偏好）响应
bad_res: 人类拒绝的响应

数据分割

	train	valid	test
BPO	13895	300	200

其他信息

源数据

我们的反馈数据来自以下开源数据集：

其他已知限制

反馈数据质量：由于我们使用开源反馈数据，包含的人类偏好可能不完全准确。
任务多样性：尽管我们努力筛选并实现数据集的多样性，但这些开源数据集显然不足以覆盖广泛的查询类型。
优化提示：优化提示由gpt-3.5-turbo基于反馈数据自动生成。尽管我们手动审查和修改了数据集，但我们不能保证所有提示优化都是正确的。

引用信息

@article{cheng2023black, title={Black-Box Prompt Optimization: Aligning Large Language Models without Model Training}, author={Cheng, Jiale and Liu, Xiao and Zheng, Kehan and Ke, Pei and Wang, Hongning and Dong, Yuxiao and Tang, Jie and Huang, Minlie}, journal={arXiv preprint arXiv:2311.04155}, year={2023} }

搜集汇总

数据集介绍

背景与挑战

背景概述

The BPO Dataset is a curated collection of 14,395 prompt optimization pairs aimed at improving LLM alignment with human preferences. It features original and optimized prompts alongside preferred and rejected responses, sourced from open feedback data and refined for quality. The dataset supports tasks like prompt engineering and model fine-tuning, with splits for training (13,895 entries), validation (300), and testing (200). It is entirely in English and accompanied by a research paper detailing its development and use.

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集