five

BPO

收藏
魔搭社区2025-12-05 更新2024-08-31 收录
下载链接:
https://modelscope.cn/datasets/ZhipuAI/BPO
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for Black-box Prompt Optimization (BPO) ## Dataset Description - **Repository:** https://github.com/thu-coai/BPO - **Paper:** https://arxiv.org/abs/2311.04155 - **Model:** ### Data Summary To advance the development of alignment in language models, we introduce a black-box alignment method. BPO enhances the alignment of various Large Language Models (LLMs) with human preferences using only a plug-and-play model. To further promote alignment work from the prompting perspective, we are releasing the BPO Dataset. This dataset comprises 14,395 entries of prompt optimization pairs, constructed using open-source feedback data with OpenAI's `gpt-3.5-turbo` engine. We have thoroughly ensured the quality and diversity of the data through careful filtering and correction. Please refer to our [paper](https://arxiv.org/abs/2311.04155) for further details. ### Supported Tasks The BPO Dataset is designed for Prompt Optimization / Engineering task. As we have carefully flitered and curated the dataset, it is also a good choice for SFT or RLHF. ### Language The data in BPO are in English. ## Data Structure ### Data Instances An example of "train" looks as follows: ```json { "prompt": "Minimize the following data set\n[3, 4, 6, 7, 8, 10]", "optimized_prompt": "Provide a step-by-step solution to minimize the following data set: [3, 4, 6, 7, 8, 10].", "good_res": "To minimize the values in the given data set, which is a list of numbers, you could sort them in ascending order as follows: [3, 4, 6, 7, 8, 10]. The minimized (least) value in this data set would be 3.", "bad_res": "[3, 4, 6, 7, 8, 10]" } ``` ### Data Fields The data fields are as follows: * `prompt`: the original user input * `optimized_prompt`: the optimized prompt based on feedback * `good_res`: human-chosen (preferred) response * `bad_res`: human-rejected response ### Data Splits | | train | valid | test | |---------------|------:|------:|------:| | BPO | 13895 | 300 | 200 | ## Additional Information ### Source Data Our feedback data come from the following open-sourced datasets: - [OASST1](https://huggingface.co/datasets/OpenAssistant/oasst1) - [hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf) - [Alpaca-GPT4](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/blob/main/data/comparison_data_v2.json) - [Chatbot Arena Conversation](https://huggingface.co/datasets/lmsys/chatbot_arena_conversations) ### Other Known Limitations - Feedback Data Quality: Due to our use of open-source feedback data, some human preferences included may not be entirely accurate. - Task Diversity: Despite our efforts to filter and achieve a diverse dataset, these open-source datasets are clearly not sufficient to cover the wide variety of user queries. - Optimized Prompts: The optimized prompts are auto-generated by `gpt-3.5-turbo` based on feedback data. Even though we have manually reviewed and modified the dataset, we cannot guarantee that all prompt optimizations are correct. ### Citation Information ``` @article{cheng2023black, title={Black-Box Prompt Optimization: Aligning Large Language Models without Model Training}, author={Cheng, Jiale and Liu, Xiao and Zheng, Kehan and Ke, Pei and Wang, Hongning and Dong, Yuxiao and Tang, Jie and Huang, Minlie}, journal={arXiv preprint arXiv:2311.04155}, year={2023} } ```

# 黑盒提示优化(Black-box Prompt Optimization, BPO)数据集卡片 ## 数据集说明 - **仓库地址:** https://github.com/thu-coai/BPO - **论文地址:** https://arxiv.org/abs/2311.04155 - **模型:** ### 数据概述 为推动语言模型对齐技术的发展,我们提出了一种黑盒对齐方法。BPO仅通过即插即用模型,即可提升各类大语言模型(Large Language Model, LLM)与人类偏好的对齐程度。为进一步从提示工程视角推动对齐研究,我们发布了BPO数据集。本数据集包含14395条提示优化配对样本,其构建依托于开源反馈数据与OpenAI的`gpt-3.5-turbo`模型。我们通过严谨的筛选与校正流程,充分保障了数据集的质量与多样性。 如需了解更多细节,请参阅我们的[论文](https://arxiv.org/abs/2311.04155)。 ### 支持任务 BPO数据集专为提示优化/提示工程任务设计。由于我们对数据集进行了精心筛选与整理,它同样可作为监督微调(Supervised Fine-Tuning, SFT)或基于人类反馈的强化学习(Reinforcement Learning from Human Feedback, RLHF)的优质训练数据源。 ### 语言说明 BPO数据集的数据均为英文。 ## 数据结构 ### 数据样例 训练集的一条样例如下所示: json { "prompt": "Minimize the following data set [3, 4, 6, 7, 8, 10]", "optimized_prompt": "Provide a step-by-step solution to minimize the following data set: [3, 4, 6, 7, 8, 10].", "good_res": "To minimize the values in the given data set, which is a list of numbers, you could sort them in ascending order as follows: [3, 4, 6, 7, 8, 10]. The minimized (least) value in this data set would be 3.", "bad_res": "[3, 4, 6, 7, 8, 10]" } ### 数据字段 各数据字段说明如下: * `prompt`:原始用户输入 * `optimized_prompt`:基于反馈优化后的提示 * `good_res`:人类选中的(偏好)回复 * `bad_res`:人类拒绝的回复 ### 数据划分 | | 训练集 | 验证集 | 测试集 | |---------------|------:|------:|------:| | BPO | 13895 | 300 | 200 | ## 附加信息 ### 源数据 我们的反馈数据来源于以下开源数据集: - [OASST1](https://huggingface.co/datasets/OpenAssistant/oasst1) - [hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf) - [Alpaca-GPT4](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/blob/main/data/comparison_data_v2.json) - [Chatbot Arena Conversation](https://huggingface.co/datasets/lmsys/chatbot_arena_conversations) ### 已知局限性 - 反馈数据质量:由于我们使用了开源反馈数据,其中包含的部分人类偏好标注可能并非完全准确。 - 任务多样性:尽管我们尽力筛选以实现数据集的多样性,但这些开源数据集显然无法覆盖用户查询的全部范畴。 - 优化提示:优化后的提示均由`gpt-3.5-turbo`基于反馈数据自动生成。即便我们已进行人工审核与修正,仍无法保证所有提示优化结果均完全正确。 ### 引用信息 @article{cheng2023black, title={Black-Box Prompt Optimization: Aligning Large Language Models without Model Training}, author={Cheng, Jiale and Liu, Xiao and Zheng, Kehan and Ke, Pei and Wang, Hongning and Dong, Yuxiao and Tang, Jie and Huang, Minlie}, journal={arXiv preprint arXiv:2311.04155}, year={2023} }
提供机构:
maas
创建时间:
2024-08-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作