BPO

Name: BPO
Creator: maas
Published: 2025-12-05 16:16:50
License: 暂无描述

魔搭社区2025-12-05 更新2024-08-31 收录

下载链接：

https://modelscope.cn/datasets/ZhipuAI/BPO

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for Black-box Prompt Optimization (BPO) ## Dataset Description - **Repository:** https://github.com/thu-coai/BPO - **Paper:** https://arxiv.org/abs/2311.04155 - **Model:** ### Data Summary To advance the development of alignment in language models, we introduce a black-box alignment method. BPO enhances the alignment of various Large Language Models (LLMs) with human preferences using only a plug-and-play model. To further promote alignment work from the prompting perspective, we are releasing the BPO Dataset. This dataset comprises 14,395 entries of prompt optimization pairs, constructed using open-source feedback data with OpenAI's `gpt-3.5-turbo` engine. We have thoroughly ensured the quality and diversity of the data through careful filtering and correction. Please refer to our [paper](https://arxiv.org/abs/2311.04155) for further details. ### Supported Tasks The BPO Dataset is designed for Prompt Optimization / Engineering task. As we have carefully flitered and curated the dataset, it is also a good choice for SFT or RLHF. ### Language The data in BPO are in English. ## Data Structure ### Data Instances An example of "train" looks as follows: ```json { "prompt": "Minimize the following data set\n[3, 4, 6, 7, 8, 10]", "optimized_prompt": "Provide a step-by-step solution to minimize the following data set: [3, 4, 6, 7, 8, 10].", "good_res": "To minimize the values in the given data set, which is a list of numbers, you could sort them in ascending order as follows: [3, 4, 6, 7, 8, 10]. The minimized (least) value in this data set would be 3.", "bad_res": "[3, 4, 6, 7, 8, 10]" } ``` ### Data Fields The data fields are as follows: * `prompt`: the original user input * `optimized_prompt`: the optimized prompt based on feedback * `good_res`: human-chosen (preferred) response * `bad_res`: human-rejected response ### Data Splits | | train | valid | test | |---------------|------:|------:|------:| | BPO | 13895 | 300 | 200 | ## Additional Information ### Source Data Our feedback data come from the following open-sourced datasets: - [OASST1](https://huggingface.co/datasets/OpenAssistant/oasst1) - [hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf) - [Alpaca-GPT4](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/blob/main/data/comparison_data_v2.json) - [Chatbot Arena Conversation](https://huggingface.co/datasets/lmsys/chatbot_arena_conversations) ### Other Known Limitations - Feedback Data Quality: Due to our use of open-source feedback data, some human preferences included may not be entirely accurate. - Task Diversity: Despite our efforts to filter and achieve a diverse dataset, these open-source datasets are clearly not sufficient to cover the wide variety of user queries. - Optimized Prompts: The optimized prompts are auto-generated by `gpt-3.5-turbo` based on feedback data. Even though we have manually reviewed and modified the dataset, we cannot guarantee that all prompt optimizations are correct. ### Citation Information ``` @article{cheng2023black, title={Black-Box Prompt Optimization: Aligning Large Language Models without Model Training}, author={Cheng, Jiale and Liu, Xiao and Zheng, Kehan and Ke, Pei and Wang, Hongning and Dong, Yuxiao and Tang, Jie and Huang, Minlie}, journal={arXiv preprint arXiv:2311.04155}, year={2023} } ```

# 黑盒提示优化（Black-box Prompt Optimization, BPO）数据集卡片 ## 数据集说明 - **仓库地址：** https://github.com/thu-coai/BPO - **论文地址：** https://arxiv.org/abs/2311.04155 - **模型：** ### 数据概述为推动语言模型对齐技术的发展，我们提出了一种黑盒对齐方法。BPO仅通过即插即用模型，即可提升各类大语言模型（Large Language Model, LLM）与人类偏好的对齐程度。为进一步从提示工程视角推动对齐研究，我们发布了BPO数据集。本数据集包含14395条提示优化配对样本，其构建依托于开源反馈数据与OpenAI的`gpt-3.5-turbo`模型。我们通过严谨的筛选与校正流程，充分保障了数据集的质量与多样性。如需了解更多细节，请参阅我们的[论文](https://arxiv.org/abs/2311.04155)。 ### 支持任务 BPO数据集专为提示优化/提示工程任务设计。由于我们对数据集进行了精心筛选与整理，它同样可作为监督微调（Supervised Fine-Tuning, SFT）或基于人类反馈的强化学习（Reinforcement Learning from Human Feedback, RLHF）的优质训练数据源。 ### 语言说明 BPO数据集的数据均为英文。 ## 数据结构 ### 数据样例训练集的一条样例如下所示： json { "prompt": "Minimize the following data set [3, 4, 6, 7, 8, 10]", "optimized_prompt": "Provide a step-by-step solution to minimize the following data set: [3, 4, 6, 7, 8, 10].", "good_res": "To minimize the values in the given data set, which is a list of numbers, you could sort them in ascending order as follows: [3, 4, 6, 7, 8, 10]. The minimized (least) value in this data set would be 3.", "bad_res": "[3, 4, 6, 7, 8, 10]" } ### 数据字段各数据字段说明如下： * `prompt`：原始用户输入 * `optimized_prompt`：基于反馈优化后的提示 * `good_res`：人类选中的（偏好）回复 * `bad_res`：人类拒绝的回复 ### 数据划分 | | 训练集 | 验证集 | 测试集 | |---------------|------:|------:|------:| | BPO | 13895 | 300 | 200 | ## 附加信息 ### 源数据我们的反馈数据来源于以下开源数据集： - [OASST1](https://huggingface.co/datasets/OpenAssistant/oasst1) - [hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf) - [Alpaca-GPT4](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/blob/main/data/comparison_data_v2.json) - [Chatbot Arena Conversation](https://huggingface.co/datasets/lmsys/chatbot_arena_conversations) ### 已知局限性 - 反馈数据质量：由于我们使用了开源反馈数据，其中包含的部分人类偏好标注可能并非完全准确。 - 任务多样性：尽管我们尽力筛选以实现数据集的多样性，但这些开源数据集显然无法覆盖用户查询的全部范畴。 - 优化提示：优化后的提示均由`gpt-3.5-turbo`基于反馈数据自动生成。即便我们已进行人工审核与修正，仍无法保证所有提示优化结果均完全正确。 ### 引用信息 @article{cheng2023black, title={Black-Box Prompt Optimization: Aligning Large Language Models without Model Training}, author={Cheng, Jiale and Liu, Xiao and Zheng, Kehan and Ke, Pei and Wang, Hongning and Dong, Yuxiao and Tang, Jie and Huang, Minlie}, journal={arXiv preprint arXiv:2311.04155}, year={2023} }

提供机构：

maas

创建时间：

2024-08-19

5,000+

优质数据集

54 个

任务类型

进入经典数据集