five

CCCCCC/BPO

收藏
Hugging Face2023-11-20 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/CCCCCC/BPO
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-generation language: - en tags: - human_feedback size_categories: - 10K<n<100K pretty_name: BPO --- # Dataset Card for Black-box Prompt Optimization (BPO) ## Dataset Description - **Repository:** https://github.com/thu-coai/BPO - **Paper:** https://arxiv.org/abs/2311.04155 - **Model:** ### Data Summary To advance the development of alignment in language models, we introduce a black-box alignment method. BPO enhances the alignment of various Large Language Models (LLMs) with human preferences using only a plug-and-play model. To further promote alignment work from the prompting perspective, we are releasing the BPO Dataset. This dataset comprises 14,395 entries of prompt optimization pairs, constructed using open-source feedback data with OpenAI's `gpt-3.5-turbo` engine. We have thoroughly ensured the quality and diversity of the data through careful filtering and correction. Please refer to our [paper](https://arxiv.org/abs/2311.04155) for further details. ### Supported Tasks The BPO Dataset is designed for Prompt Optimization / Engineering task. As we have carefully flitered and curated the dataset, it is also a good choice for SFT or RLHF. ### Language The data in BPO are in English. ## Data Structure ### Data Instances An example of "train" looks as follows: ```json { "prompt": "Minimize the following data set\n[3, 4, 6, 7, 8, 10]", "optimized_prompt": "Provide a step-by-step solution to minimize the following data set: [3, 4, 6, 7, 8, 10].", "good_res": "To minimize the values in the given data set, which is a list of numbers, you could sort them in ascending order as follows: [3, 4, 6, 7, 8, 10]. The minimized (least) value in this data set would be 3.", "bad_res": "[3, 4, 6, 7, 8, 10]" } ``` ### Data Fields The data fields are as follows: * `prompt`: the original user input * `optimized_prompt`: the optimized prompt based on feedback * `good_res`: human-chosen (preferred) response * `bad_res`: human-rejected response ### Data Splits | | train | valid | test | |---------------|------:|------:|------:| | BPO | 13895 | 300 | 200 | ## Additional Information ### Source Data Our feedback data come from the following open-sourced datasets: - [OASST1](https://huggingface.co/datasets/OpenAssistant/oasst1) - [hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf) - [Alpaca-GPT4](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/blob/main/data/comparison_data_v2.json) - [Chatbot Arena Conversation](https://huggingface.co/datasets/lmsys/chatbot_arena_conversations) ### Other Known Limitations - Feedback Data Quality: Due to our use of open-source feedback data, some human preferences included may not be entirely accurate. - Task Diversity: Despite our efforts to filter and achieve a diverse dataset, these open-source datasets are clearly not sufficient to cover the wide variety of user queries. - Optimized Prompts: The optimized prompts are auto-generated by `gpt-3.5-turbo` based on feedback data. Even though we have manually reviewed and modified the dataset, we cannot guarantee that all prompt optimizations are correct. ### Citation Information ``` @article{cheng2023black, title={Black-Box Prompt Optimization: Aligning Large Language Models without Model Training}, author={Cheng, Jiale and Liu, Xiao and Zheng, Kehan and Ke, Pei and Wang, Hongning and Dong, Yuxiao and Tang, Jie and Huang, Minlie}, journal={arXiv preprint arXiv:2311.04155}, year={2023} } ```
提供机构:
CCCCCC
原始信息汇总

数据集卡片 for Black-box Prompt Optimization (BPO)

数据集描述

数据概述

为了推进语言模型对齐技术的发展,我们引入了一种黑盒对齐方法。BPO通过一个即插即用的模型增强了各种大型语言模型(LLMs)与人类偏好的对齐。为了从提示优化的角度进一步推动对齐工作,我们发布了BPO数据集。该数据集包含14,395条提示优化对,使用OpenAI的gpt-3.5-turbo引擎基于开源反馈数据构建。我们通过仔细的筛选和校正,确保了数据的质量和多样性。

支持的任务

BPO数据集设计用于提示优化/工程任务。由于我们仔细筛选和整理了数据集,它也是SFT或RLHF的不错选择。

语言

BPO数据集中的数据为英语。

数据结构

数据实例

一个"train"示例如下: json { "prompt": "Minimize the following data set [3, 4, 6, 7, 8, 10]", "optimized_prompt": "Provide a step-by-step solution to minimize the following data set: [3, 4, 6, 7, 8, 10].", "good_res": "To minimize the values in the given data set, which is a list of numbers, you could sort them in ascending order as follows: [3, 4, 6, 7, 8, 10]. The minimized (least) value in this data set would be 3.", "bad_res": "[3, 4, 6, 7, 8, 10]" }

数据字段

数据字段如下:

  • prompt: 原始用户输入
  • optimized_prompt: 基于反馈优化的提示
  • good_res: 人类选择的(偏好)响应
  • bad_res: 人类拒绝的响应

数据分割

train valid test
BPO 13895 300 200

附加信息

源数据

我们的反馈数据来自以下开源数据集:

其他已知限制

  • 反馈数据质量:由于我们使用开源反馈数据,包含的人类偏好可能不完全准确。
  • 任务多样性:尽管我们努力筛选并实现数据集的多样性,但这些开源数据集显然不足以覆盖广泛的查询类型。
  • 优化提示:优化提示由gpt-3.5-turbo基于反馈数据自动生成。尽管我们手动审查和修改了数据集,但我们不能保证所有提示优化都是正确的。

引用信息

@article{cheng2023black, title={Black-Box Prompt Optimization: Aligning Large Language Models without Model Training}, author={Cheng, Jiale and Liu, Xiao and Zheng, Kehan and Ke, Pei and Wang, Hongning and Dong, Yuxiao and Tang, Jie and Huang, Minlie}, journal={arXiv preprint arXiv:2311.04155}, year={2023} }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作