HuggingFaceH4/ultrafeedback_binarized

Name: HuggingFaceH4/ultrafeedback_binarized
Creator: HuggingFaceH4
Published: 2024-01-08 11:33:19
License: 暂无描述

Hugging Face2024-01-08 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/HuggingFaceH4/ultrafeedback_binarized

下载链接

链接失效反馈

官方服务：

资源简介：

这是UltraFeedback数据集的预处理版本，用于训练Zephyr-7Β-β模型，这是一个在7B参数规模上的先进聊天模型。原始UltraFeedback数据集包含64k个提示，每个提示有四个来自各种开放和专有模型的完成。GPT-4用于为每个完成评分，评分标准包括帮助性和诚实性。为了创建UltraFeedback Binarized，我们选择了最高overall_score的完成作为“chosen”完成，并随机选择其余三个中的一个作为“rejected”完成。这定义了用于奖励建模或DPO等技术的偏好建模分割。我们还创建了用于监督微调（SFT）的分割，使用“chosen”列作为对话进行建模，以及涉及生成的分割，如拒绝采样或PPO。

This is a preprocessed variant of the UltraFeedback dataset, intended for training the Zephyr-7Β-β model—an advanced 7B-parameter chat model. The original UltraFeedback dataset contains 64k prompts, each paired with four completions generated by various open-source and proprietary models. GPT-4 was employed to score each completion, with evaluation criteria covering helpfulness and honesty. To create the UltraFeedback Binarized dataset, we selected the completion with the highest overall_score as the "chosen" completion, and randomly picked one of the remaining three as the "rejected" completion. This defines the preference modeling split for techniques such as reward modeling or DPO. We also developed splits for Supervised Fine-Tuning (SFT), which use the "chosen" column to model dialogues, as well as generation-oriented splits such as rejection sampling or PPO.

提供机构：

HuggingFaceH4

原始信息汇总

数据集概述

名称: UltraFeedback Binarized

语言: 英语

许可证: MIT

任务类别:

对话式
文本生成

配置:

默认配置
- 数据文件:
  - train_prefs: data/train_prefs-*
  - train_sft: data/train_sft-*
  - test_prefs: data/test_prefs-*
  - test_sft: data/test_sft-*
  - train_gen: data/train_gen-*
  - test_gen: data/test_gen-*

数据集信息:

特征:
- prompt: 字符串
- prompt_id: 字符串
- chosen:
  - content: 字符串
  - role: 字符串
- rejected:
  - content: 字符串
  - role: 字符串
- messages:
  - content: 字符串
  - role: 字符串
- score_chosen: float64
- score_rejected: float64

数据分割:

train_prefs: 61135个示例，405688662字节
train_sft: 61135个示例，405688662字节
test_prefs: 2000个示例，13161585字节
test_sft: 1000个示例，6697333字节
train_gen: 61135个示例，325040536字节
test_gen: 1000个示例，5337695字节

下载大小: 649967196字节 数据集大小: 1161614473字节

数据集结构

使用: python from datasets import load_dataset ds = load_dataset("HuggingFaceH4/ultrafeedback_binarized")
数据分割详情:

train_sft test_sft train_prefs test_prefs train_gen test_gen

61135 1000 61135 2000 61135 1000
数据存储格式: Parquet
数据集架构: json { "prompt": "字符串", "chosen": [ {"content": "字符串", "role": "字符串"}, {"content": "字符串", "role": "字符串"} ], "messages": [ {"content": "字符串", "role": "字符串"}, {"content": "字符串", "role": "字符串"} ], "prompt_id": "字符串", "rejected": [ {"content": "字符串", "role": "字符串"}, {"content": "字符串", "role": "字符串"} ], "score_chosen": "float64", "score_rejected": "float64" }

引用

原始数据集: https://huggingface.co/datasets/openbmb/UltraFeedback
Zephyr 7B技术报告: bibtex @misc{tunstall2023zephyr, title={Zephyr: Direct Distillation of LM Alignment}, author={Lewis Tunstall and others}, year={2023}, eprint={2310.16944}, archivePrefix={arXiv}, primaryClass={cs.LG} }

5,000+

优质数据集

54 个

任务类型

进入经典数据集