HuggingFaceH4/ultrafeedback_binarized
收藏Hugging Face2024-01-08 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/HuggingFaceH4/ultrafeedback_binarized
下载链接
链接失效反馈官方服务:
资源简介:
这是UltraFeedback数据集的预处理版本,用于训练Zephyr-7Β-β模型,这是一个在7B参数规模上的先进聊天模型。原始UltraFeedback数据集包含64k个提示,每个提示有四个来自各种开放和专有模型的完成。GPT-4用于为每个完成评分,评分标准包括帮助性和诚实性。为了创建UltraFeedback Binarized,我们选择了最高overall_score的完成作为“chosen”完成,并随机选择其余三个中的一个作为“rejected”完成。这定义了用于奖励建模或DPO等技术的偏好建模分割。我们还创建了用于监督微调(SFT)的分割,使用“chosen”列作为对话进行建模,以及涉及生成的分割,如拒绝采样或PPO。
This is a preprocessed variant of the UltraFeedback dataset, intended for training the Zephyr-7Β-β model—an advanced 7B-parameter chat model.
The original UltraFeedback dataset contains 64k prompts, each paired with four completions generated by various open-source and proprietary models. GPT-4 was employed to score each completion, with evaluation criteria covering helpfulness and honesty.
To create the UltraFeedback Binarized dataset, we selected the completion with the highest overall_score as the "chosen" completion, and randomly picked one of the remaining three as the "rejected" completion. This defines the preference modeling split for techniques such as reward modeling or DPO.
We also developed splits for Supervised Fine-Tuning (SFT), which use the "chosen" column to model dialogues, as well as generation-oriented splits such as rejection sampling or PPO.
提供机构:
HuggingFaceH4
原始信息汇总
数据集概述
名称: UltraFeedback Binarized
语言: 英语
许可证: MIT
任务类别:
- 对话式
- 文本生成
配置:
- 默认配置
- 数据文件:
- train_prefs: data/train_prefs-*
- train_sft: data/train_sft-*
- test_prefs: data/test_prefs-*
- test_sft: data/test_sft-*
- train_gen: data/train_gen-*
- test_gen: data/test_gen-*
- 数据文件:
数据集信息:
- 特征:
- prompt: 字符串
- prompt_id: 字符串
- chosen:
- content: 字符串
- role: 字符串
- rejected:
- content: 字符串
- role: 字符串
- messages:
- content: 字符串
- role: 字符串
- score_chosen: float64
- score_rejected: float64
数据分割:
- train_prefs: 61135个示例,405688662字节
- train_sft: 61135个示例,405688662字节
- test_prefs: 2000个示例,13161585字节
- test_sft: 1000个示例,6697333字节
- train_gen: 61135个示例,325040536字节
- test_gen: 1000个示例,5337695字节
下载大小: 649967196字节 数据集大小: 1161614473字节
数据集结构
-
使用: python from datasets import load_dataset ds = load_dataset("HuggingFaceH4/ultrafeedback_binarized")
-
数据分割详情:
train_sft test_sft train_prefs test_prefs train_gen test_gen 61135 1000 61135 2000 61135 1000 -
数据存储格式: Parquet
-
数据集架构: json { "prompt": "字符串", "chosen": [ {"content": "字符串", "role": "字符串"}, {"content": "字符串", "role": "字符串"} ], "messages": [ {"content": "字符串", "role": "字符串"}, {"content": "字符串", "role": "字符串"} ], "prompt_id": "字符串", "rejected": [ {"content": "字符串", "role": "字符串"}, {"content": "字符串", "role": "字符串"} ], "score_chosen": "float64", "score_rejected": "float64" }
引用
- 原始数据集: https://huggingface.co/datasets/openbmb/UltraFeedback
- Zephyr 7B技术报告: bibtex @misc{tunstall2023zephyr, title={Zephyr: Direct Distillation of LM Alignment}, author={Lewis Tunstall and others}, year={2023}, eprint={2310.16944}, archivePrefix={arXiv}, primaryClass={cs.LG} }



