CCCCCC/BPO

Name: CCCCCC/BPO
Creator: CCCCCC
Published: 2023-11-20 05:42:13
License: 暂无描述

Hugging Face2023-11-20 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/CCCCCC/BPO

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - text-generation language: - en tags: - human_feedback size_categories: - 10K<n<100K pretty_name: BPO --- # Dataset Card for Black-box Prompt Optimization (BPO) ## Dataset Description - **Repository:** https://github.com/thu-coai/BPO - **Paper:** https://arxiv.org/abs/2311.04155 - **Model:** ### Data Summary To advance the development of alignment in language models, we introduce a black-box alignment method. BPO enhances the alignment of various Large Language Models (LLMs) with human preferences using only a plug-and-play model. To further promote alignment work from the prompting perspective, we are releasing the BPO Dataset. This dataset comprises 14,395 entries of prompt optimization pairs, constructed using open-source feedback data with OpenAI's `gpt-3.5-turbo` engine. We have thoroughly ensured the quality and diversity of the data through careful filtering and correction. Please refer to our [paper](https://arxiv.org/abs/2311.04155) for further details. ### Supported Tasks The BPO Dataset is designed for Prompt Optimization / Engineering task. As we have carefully flitered and curated the dataset, it is also a good choice for SFT or RLHF. ### Language The data in BPO are in English. ## Data Structure ### Data Instances An example of "train" looks as follows: ```json { "prompt": "Minimize the following data set\n[3, 4, 6, 7, 8, 10]", "optimized_prompt": "Provide a step-by-step solution to minimize the following data set: [3, 4, 6, 7, 8, 10].", "good_res": "To minimize the values in the given data set, which is a list of numbers, you could sort them in ascending order as follows: [3, 4, 6, 7, 8, 10]. The minimized (least) value in this data set would be 3.", "bad_res": "[3, 4, 6, 7, 8, 10]" } ``` ### Data Fields The data fields are as follows: * `prompt`: the original user input * `optimized_prompt`: the optimized prompt based on feedback * `good_res`: human-chosen (preferred) response * `bad_res`: human-rejected response ### Data Splits | | train | valid | test | |---------------|------:|------:|------:| | BPO | 13895 | 300 | 200 | ## Additional Information ### Source Data Our feedback data come from the following open-sourced datasets: - [OASST1](https://huggingface.co/datasets/OpenAssistant/oasst1) - [hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf) - [Alpaca-GPT4](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/blob/main/data/comparison_data_v2.json) - [Chatbot Arena Conversation](https://huggingface.co/datasets/lmsys/chatbot_arena_conversations) ### Other Known Limitations - Feedback Data Quality: Due to our use of open-source feedback data, some human preferences included may not be entirely accurate. - Task Diversity: Despite our efforts to filter and achieve a diverse dataset, these open-source datasets are clearly not sufficient to cover the wide variety of user queries. - Optimized Prompts: The optimized prompts are auto-generated by `gpt-3.5-turbo` based on feedback data. Even though we have manually reviewed and modified the dataset, we cannot guarantee that all prompt optimizations are correct. ### Citation Information ``` @article{cheng2023black, title={Black-Box Prompt Optimization: Aligning Large Language Models without Model Training}, author={Cheng, Jiale and Liu, Xiao and Zheng, Kehan and Ke, Pei and Wang, Hongning and Dong, Yuxiao and Tang, Jie and Huang, Minlie}, journal={arXiv preprint arXiv:2311.04155}, year={2023} } ```

提供机构：

CCCCCC

原始信息汇总

数据集卡片 for Black-box Prompt Optimization (BPO)

数据集描述

数据概述

为了推进语言模型对齐技术的发展，我们引入了一种黑盒对齐方法。BPO通过一个即插即用的模型增强了各种大型语言模型（LLMs）与人类偏好的对齐。为了从提示优化的角度进一步推动对齐工作，我们发布了BPO数据集。该数据集包含14,395条提示优化对，使用OpenAI的gpt-3.5-turbo引擎基于开源反馈数据构建。我们通过仔细的筛选和校正，确保了数据的质量和多样性。

支持的任务

BPO数据集设计用于提示优化/工程任务。由于我们仔细筛选和整理了数据集，它也是SFT或RLHF的不错选择。

语言

BPO数据集中的数据为英语。

数据结构

数据实例

一个"train"示例如下： json { "prompt": "Minimize the following data set [3, 4, 6, 7, 8, 10]", "optimized_prompt": "Provide a step-by-step solution to minimize the following data set: [3, 4, 6, 7, 8, 10].", "good_res": "To minimize the values in the given data set, which is a list of numbers, you could sort them in ascending order as follows: [3, 4, 6, 7, 8, 10]. The minimized (least) value in this data set would be 3.", "bad_res": "[3, 4, 6, 7, 8, 10]" }

数据字段

数据字段如下：

prompt: 原始用户输入
optimized_prompt: 基于反馈优化的提示
good_res: 人类选择的（偏好）响应
bad_res: 人类拒绝的响应

数据分割

	train	valid	test
BPO	13895	300	200

附加信息

源数据

我们的反馈数据来自以下开源数据集：

其他已知限制

反馈数据质量：由于我们使用开源反馈数据，包含的人类偏好可能不完全准确。
任务多样性：尽管我们努力筛选并实现数据集的多样性，但这些开源数据集显然不足以覆盖广泛的查询类型。
优化提示：优化提示由gpt-3.5-turbo基于反馈数据自动生成。尽管我们手动审查和修改了数据集，但我们不能保证所有提示优化都是正确的。

引用信息

@article{cheng2023black, title={Black-Box Prompt Optimization: Aligning Large Language Models without Model Training}, author={Cheng, Jiale and Liu, Xiao and Zheng, Kehan and Ke, Pei and Wang, Hongning and Dong, Yuxiao and Tang, Jie and Huang, Minlie}, journal={arXiv preprint arXiv:2311.04155}, year={2023} }

5,000+

优质数据集

54 个

任务类型

进入经典数据集