zhengr/ultrafeedback_binarized
收藏UltraFeedback Binarized 数据集概述
数据集描述
UltraFeedback Binarized 是 UltraFeedback 数据集 的预处理版本,用于训练 Zephyr-7Β-β 模型,该模型是当前 7B 参数规模的先进聊天模型。
原始 UltraFeedback 数据集包含 64k 个提示,每个提示伴随四个来自不同开放和专有模型的完成。GPT-4 根据帮助性和诚实性等标准为每个完成分配一个分数。为了创建 UltraFeedback Binarized,我们选择了最高平均分数作为“chosen”完成,并从剩余的三个中随机选择一个作为“rejected”完成。这定义了用于奖励建模或 DPO 等技术的偏好建模拆分。我们还创建了用于监督微调(SFT)的拆分,使用“chosen”列作为对话模型,以及涉及生成(如拒绝采样或 PPO)的拆分。有关数据集处理的详细信息,请参阅脚本。
数据集结构
数据拆分
数据集包含六个拆分,适用于:
- 监督微调(
sft)。 - 偏好建模(
prefs)以训练奖励模型或应用 DPO 等技术。 - 生成排序(
gen)通过拒绝采样或 PPO 等技术。
每个拆分的示例数量如下:
| train_sft | test_sft | train_prefs | test_prefs | train_gen | test_gen |
|---|---|---|---|---|---|
| 61966 | 1000 | 61966 | 2000 | 61966 | 1000 |
数据集以 parquet 格式存储,每个条目使用以下模式:
json { "prompt": "Which books are still banned in Canada?", "chosen": [ { "content": "Which books are still banned in Canada?", "role": "user" }, { "content": "As a responsible and ethical AI assistant, I must inform you that I cannot provide a list of banned books in Canada, as there are no books that are currently banned in the country. Canada has a strong commitment to freedom of expression and the right to access information, and there are no laws in place that restrict the publication or distribution of books. ", "role": "assistant" } ], "messages": [ { "content": "Which books are still banned in Canada?", "role": "user" }, { "content": "As a responsible and ethical AI assistant, I must inform you that I cannot provide a list of banned books in Canada, as there are no books that are currently banned in the country. Canada has a strong commitment to freedom of expression and the right to access information, and there are no laws in place that restrict the publication or distribution of books. ", "role": "assistant" }],
"prompt_id": "aeccf551d9ba42fdf5f2044de43b8ce6e360fb523ace428317b81d804594e090", "rejected": [ { "content": "Which books are still banned in Canada?", "role": "user"}, { "content": "According to the Canadian Government’s Ban Affront website, there are still several books that are banned in Canada. These include The Begum’s Millionaire, The Education of Little Tree, The Harry Potter series, Lolita, 1984, and Lady Chatterley’s Lover. Some of these books are considered inaccessible due to their age, while others are still legally banned in certain parts of the country.", "role": "assistant" } ], "score_chosen": 8.0, "score_rejected": 5.0 }
您应该使用 chosen 和 rejected 列进行 DPO 等技术,而 messages 列适用于 SFT 或 PPO。
引用
如果您在工作中发现此数据集有用,请引用原始 UltraFeedback 数据集:https://huggingface.co/datasets/openbmb/UltraFeedback
您也可以引用 Zephyr 7B 技术报告:
@misc{tunstall2023zephyr, title={Zephyr: Direct Distillation of LM Alignment}, author={Lewis Tunstall and Edward Beeching and Nathan Lambert and Nazneen Rajani and Kashif Rasul and Younes Belkada and Shengyi Huang and Leandro von Werra and Clémentine Fourrier and Nathan Habib and Nathan Sarrazin and Omar Sanseviero and Alexander M. Rush and Thomas Wolf}, year={2023}, eprint={2310.16944}, archivePrefix={arXiv}, primaryClass={cs.LG} }




