zhengr/ultrafeedback_binarized

Name: zhengr/ultrafeedback_binarized
Creator: zhengr
Published: 2023-11-08 14:18:27
License: 暂无描述

Hugging Face2023-11-08 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/zhengr/ultrafeedback_binarized

下载链接

链接失效反馈

官方服务：

资源简介：

UltraFeedback Binarized数据集是UltraFeedback数据集的预处理版本，用于训练Zephyr-7Β-β模型。原始UltraFeedback数据集包含64k个提示，每个提示有四个模型生成的完成，GPT-4根据帮助性和诚实性等标准对每个完成进行评分。为了创建UltraFeedback Binarized，选择了平均得分最高的完成作为“选择”的完成，并随机选择其余三个中的一个作为“拒绝”的完成。这定义了用于奖励建模或DPO等技术的偏好建模部分。还创建了用于监督微调（SFT）的部分，使用“选择”列作为对话模型，以及涉及生成的部分，如拒绝采样或PPO。

The UltraFeedback Binarized dataset is a preprocessed variant of the UltraFeedback dataset, intended for training the Zephyr-7B-β model. The original UltraFeedback dataset consists of 64k prompts, each paired with four model-generated completions, with GPT-4 scoring each completion against criteria including helpfulness and honesty. To construct the UltraFeedback Binarized dataset, the completion with the highest average score is selected as the "chosen" completion, while one of the remaining three completions is randomly chosen as the "rejected" completion. This setup defines the preference modeling component for techniques such as reward modeling or Direct Preference Optimization (DPO). Additionally, subsets for Supervised Fine-Tuning (SFT) are created, which utilize the "chosen" column as the dialogue model training target, alongside sections dedicated to generation-related tasks such as rejection sampling and Proximal Policy Optimization (PPO).

提供机构：

zhengr

原始信息汇总

UltraFeedback Binarized 数据集概述

数据集描述

UltraFeedback Binarized 是 UltraFeedback 数据集的预处理版本，用于训练 Zephyr-7Β-β 模型，该模型是当前 7B 参数规模的先进聊天模型。

原始 UltraFeedback 数据集包含 64k 个提示，每个提示伴随四个来自不同开放和专有模型的完成。GPT-4 根据帮助性和诚实性等标准为每个完成分配一个分数。为了创建 UltraFeedback Binarized，我们选择了最高平均分数作为“chosen”完成，并从剩余的三个中随机选择一个作为“rejected”完成。这定义了用于奖励建模或 DPO 等技术的偏好建模拆分。我们还创建了用于监督微调（SFT）的拆分，使用“chosen”列作为对话模型，以及涉及生成（如拒绝采样或 PPO）的拆分。有关数据集处理的详细信息，请参阅脚本。

数据集结构

数据拆分

数据集包含六个拆分，适用于：

监督微调（sft）。
偏好建模（prefs）以训练奖励模型或应用 DPO 等技术。
生成排序（gen）通过拒绝采样或 PPO 等技术。

每个拆分的示例数量如下：

train_sft	test_sft	train_prefs	test_prefs	train_gen	test_gen
61966	1000	61966	2000	61966	1000

数据集以 parquet 格式存储，每个条目使用以下模式：

json { "prompt": "Which books are still banned in Canada?", "chosen": [ { "content": "Which books are still banned in Canada?", "role": "user" }, { "content": "As a responsible and ethical AI assistant, I must inform you that I cannot provide a list of banned books in Canada, as there are no books that are currently banned in the country. Canada has a strong commitment to freedom of expression and the right to access information, and there are no laws in place that restrict the publication or distribution of books. ", "role": "assistant" } ], "messages": [ { "content": "Which books are still banned in Canada?", "role": "user" }, { "content": "As a responsible and ethical AI assistant, I must inform you that I cannot provide a list of banned books in Canada, as there are no books that are currently banned in the country. Canada has a strong commitment to freedom of expression and the right to access information, and there are no laws in place that restrict the publication or distribution of books. ", "role": "assistant" }],

"prompt_id": "aeccf551d9ba42fdf5f2044de43b8ce6e360fb523ace428317b81d804594e090", "rejected": [ { "content": "Which books are still banned in Canada?", "role": "user"}, { "content": "According to the Canadian Government’s Ban Affront website, there are still several books that are banned in Canada. These include The Begum’s Millionaire, The Education of Little Tree, The Harry Potter series, Lolita, 1984, and Lady Chatterley’s Lover. Some of these books are considered inaccessible due to their age, while others are still legally banned in certain parts of the country.", "role": "assistant" } ], "score_chosen": 8.0, "score_rejected": 5.0 }

您应该使用 chosen 和 rejected 列进行 DPO 等技术，而 messages 列适用于 SFT 或 PPO。

引用

如果您在工作中发现此数据集有用，请引用原始 UltraFeedback 数据集：https://huggingface.co/datasets/openbmb/UltraFeedback

您也可以引用 Zephyr 7B 技术报告：

@misc{tunstall2023zephyr, title={Zephyr: Direct Distillation of LM Alignment}, author={Lewis Tunstall and Edward Beeching and Nathan Lambert and Nazneen Rajani and Kashif Rasul and Younes Belkada and Shengyi Huang and Leandro von Werra and Clémentine Fourrier and Nathan Habib and Nathan Sarrazin and Omar Sanseviero and Alexander M. Rush and Thomas Wolf}, year={2023}, eprint={2310.16944}, archivePrefix={arXiv}, primaryClass={cs.LG} }

搜集汇总

数据集介绍

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集