BPO
收藏魔搭社区2025-12-05 更新2024-08-31 收录
下载链接:
https://modelscope.cn/datasets/ZhipuAI/BPO
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for Black-box Prompt Optimization (BPO)
## Dataset Description
- **Repository:** https://github.com/thu-coai/BPO
- **Paper:** https://arxiv.org/abs/2311.04155
- **Model:**
### Data Summary
To advance the development of alignment in language models, we introduce a black-box alignment method. BPO enhances the alignment of various Large Language Models (LLMs) with human preferences using only a plug-and-play model. To further promote alignment work from the prompting perspective, we are releasing the BPO Dataset. This dataset comprises 14,395 entries of prompt optimization pairs, constructed using open-source feedback data with OpenAI's `gpt-3.5-turbo` engine. We have thoroughly ensured the quality and diversity of the data through careful filtering and correction.
Please refer to our [paper](https://arxiv.org/abs/2311.04155) for further details.
### Supported Tasks
The BPO Dataset is designed for Prompt Optimization / Engineering task. As we have carefully flitered and curated the dataset, it is also a good choice for SFT or RLHF.
### Language
The data in BPO are in English.
## Data Structure
### Data Instances
An example of "train" looks as follows:
```json
{
"prompt": "Minimize the following data set\n[3, 4, 6, 7, 8, 10]",
"optimized_prompt": "Provide a step-by-step solution to minimize the following data set: [3, 4, 6, 7, 8, 10].",
"good_res": "To minimize the values in the given data set, which is a list of numbers, you could sort them in ascending order as follows: [3, 4, 6, 7, 8, 10]. The minimized (least) value in this data set would be 3.",
"bad_res": "[3, 4, 6, 7, 8, 10]"
}
```
### Data Fields
The data fields are as follows:
* `prompt`: the original user input
* `optimized_prompt`: the optimized prompt based on feedback
* `good_res`: human-chosen (preferred) response
* `bad_res`: human-rejected response
### Data Splits
| | train | valid | test |
|---------------|------:|------:|------:|
| BPO | 13895 | 300 | 200 |
## Additional Information
### Source Data
Our feedback data come from the following open-sourced datasets:
- [OASST1](https://huggingface.co/datasets/OpenAssistant/oasst1)
- [hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf)
- [Alpaca-GPT4](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/blob/main/data/comparison_data_v2.json)
- [Chatbot Arena Conversation](https://huggingface.co/datasets/lmsys/chatbot_arena_conversations)
### Other Known Limitations
- Feedback Data Quality: Due to our use of open-source feedback data, some human preferences included may not be entirely accurate.
- Task Diversity: Despite our efforts to filter and achieve a diverse dataset, these open-source datasets are clearly not sufficient to cover the wide variety of user queries.
- Optimized Prompts: The optimized prompts are auto-generated by `gpt-3.5-turbo` based on feedback data. Even though we have manually reviewed and modified the dataset, we cannot guarantee that all prompt optimizations are correct.
### Citation Information
```
@article{cheng2023black,
title={Black-Box Prompt Optimization: Aligning Large Language Models without Model Training},
author={Cheng, Jiale and Liu, Xiao and Zheng, Kehan and Ke, Pei and Wang, Hongning and Dong, Yuxiao and Tang, Jie and Huang, Minlie},
journal={arXiv preprint arXiv:2311.04155},
year={2023}
}
```
# 黑盒提示优化(Black-box Prompt Optimization, BPO)数据集卡片
## 数据集说明
- **仓库地址:** https://github.com/thu-coai/BPO
- **论文地址:** https://arxiv.org/abs/2311.04155
- **模型:**
### 数据概述
为推动语言模型对齐技术的发展,我们提出了一种黑盒对齐方法。BPO仅通过即插即用模型,即可提升各类大语言模型(Large Language Model, LLM)与人类偏好的对齐程度。为进一步从提示工程视角推动对齐研究,我们发布了BPO数据集。本数据集包含14395条提示优化配对样本,其构建依托于开源反馈数据与OpenAI的`gpt-3.5-turbo`模型。我们通过严谨的筛选与校正流程,充分保障了数据集的质量与多样性。
如需了解更多细节,请参阅我们的[论文](https://arxiv.org/abs/2311.04155)。
### 支持任务
BPO数据集专为提示优化/提示工程任务设计。由于我们对数据集进行了精心筛选与整理,它同样可作为监督微调(Supervised Fine-Tuning, SFT)或基于人类反馈的强化学习(Reinforcement Learning from Human Feedback, RLHF)的优质训练数据源。
### 语言说明
BPO数据集的数据均为英文。
## 数据结构
### 数据样例
训练集的一条样例如下所示:
json
{
"prompt": "Minimize the following data set
[3, 4, 6, 7, 8, 10]",
"optimized_prompt": "Provide a step-by-step solution to minimize the following data set: [3, 4, 6, 7, 8, 10].",
"good_res": "To minimize the values in the given data set, which is a list of numbers, you could sort them in ascending order as follows: [3, 4, 6, 7, 8, 10]. The minimized (least) value in this data set would be 3.",
"bad_res": "[3, 4, 6, 7, 8, 10]"
}
### 数据字段
各数据字段说明如下:
* `prompt`:原始用户输入
* `optimized_prompt`:基于反馈优化后的提示
* `good_res`:人类选中的(偏好)回复
* `bad_res`:人类拒绝的回复
### 数据划分
| | 训练集 | 验证集 | 测试集 |
|---------------|------:|------:|------:|
| BPO | 13895 | 300 | 200 |
## 附加信息
### 源数据
我们的反馈数据来源于以下开源数据集:
- [OASST1](https://huggingface.co/datasets/OpenAssistant/oasst1)
- [hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf)
- [Alpaca-GPT4](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/blob/main/data/comparison_data_v2.json)
- [Chatbot Arena Conversation](https://huggingface.co/datasets/lmsys/chatbot_arena_conversations)
### 已知局限性
- 反馈数据质量:由于我们使用了开源反馈数据,其中包含的部分人类偏好标注可能并非完全准确。
- 任务多样性:尽管我们尽力筛选以实现数据集的多样性,但这些开源数据集显然无法覆盖用户查询的全部范畴。
- 优化提示:优化后的提示均由`gpt-3.5-turbo`基于反馈数据自动生成。即便我们已进行人工审核与修正,仍无法保证所有提示优化结果均完全正确。
### 引用信息
@article{cheng2023black,
title={Black-Box Prompt Optimization: Aligning Large Language Models without Model Training},
author={Cheng, Jiale and Liu, Xiao and Zheng, Kehan and Ke, Pei and Wang, Hongning and Dong, Yuxiao and Tang, Jie and Huang, Minlie},
journal={arXiv preprint arXiv:2311.04155},
year={2023}
}
提供机构:
maas
创建时间:
2024-08-19



