five

ultrafeedback_binarized

收藏
魔搭社区2026-05-11 更新2024-06-15 收录
下载链接:
https://modelscope.cn/datasets/llamafactory/ultrafeedback_binarized
下载链接
链接失效反馈
官方服务:
资源简介:
Borrowed from: https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized

# UltraFeedback二值化数据集卡片 ## 数据集概述 本数据集为[UltraFeedback数据集](https://huggingface.co/datasets/openbmb/UltraFeedback)的预处理版本,曾用于训练[Zephyr-7B-β](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)——一款7B参数规模下的当前最先进对话模型。 原始UltraFeedback数据集包含64k条提示词(prompt),每条提示词均搭配来自各类开源与闭源模型的4条模型输出。随后由GPT-4依据有用性、诚实性等标准为每条输出评分。为构建`UltraFeedback二值化`数据集,我们选取最高`overall_score`对应的输出作为「选中(chosen)」样本,并从剩余3条中随机选取一条作为「拒绝(rejected)」样本,以此定义适用于奖励建模(reward modelling)或直接偏好优化(DPO)等偏好建模技术的划分方式。此外,我们还创建了适用于监督微调(SFT)的划分集,以`chosen`列作为待建模对话数据;同时也提供了适用于拒绝采样、近端策略优化(PPO)等生成类任务的划分集。有关数据集处理的详细细节,请参阅配套的[处理脚本](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized/blob/main/create_dataset.py)。 ## 数据集结构 ### 使用方式 加载该数据集可执行以下代码: python from datasets import load_dataset ds = load_dataset("HuggingFaceH4/ultrafeedback_binarized") **注意:** 在Zephyr-7B-β发布后,[Argilla](https://huggingface.co/argilla)团队发现有数百条输出存在标签错误;此外社区用户也指出,部分提示词(prompt)源自TruthfulQA基准测试集,可能导致与公开排行榜出现数据污染。上述两个问题均已在数据集的最新版本中修复,若需使用旧版数据集,可执行以下代码: python from datasets import load_dataset ds = load_dataset("HuggingFaceH4/ultrafeedback_binarized", revision="292c16329d921287c4166934cac1a6ad1e13a6c5") ### 数据划分 本数据集包含6种划分集,分别适用于: - 监督微调(SFT)任务 - 偏好建模(prefs)任务,用于训练奖励模型或应用直接偏好优化(DPO)等技术 - 生成排序(gen)任务,可用于拒绝采样、近端策略优化(PPO)等生成类技术 各划分集的样本数量如下表所示: | train_sft | test_sft | train_prefs | test_prefs | train_gen | test_gen | |:-------:|:-----------:|:-----:| :-----:| :-----:| :-----:| | 61135 | 1000 | 61135 | 2000 | 61135 | 1000 | 本数据集以Parquet格式存储,每条数据的结构如下所示: { "prompt": "Which books are still banned in Canada?", "chosen": [ { "content": "Which books are still banned in Canada?", "role": "user" }, { "content": "As a responsible and ethical AI assistant, I must inform you that I cannot provide a list of banned books in Canada, as there are no books that are currently banned in the country. Canada has a strong commitment to freedom of expression and the right to access information, and there are no laws in place that restrict the publication or distribution of books. ", "role": "assistant" } ], "messages": [ { "content": "Which books are still banned in Canada?", "role": "user" }, { "content": "As a responsible and ethical AI assistant, I must inform you that I cannot provide a list of banned books in Canada, as there are no books that are currently banned in the country. Canada has a strong commitment to freedom of expression and the right to access information, and there are no laws in place that restrict the publication or distribution of books. ", "role": "assistant" }], "prompt_id": "aeccf551d9ba42fdf5f2044de43b8ce6e360fb523ace428317b81d804594e090", "rejected": [ { "content": "Which books are still banned in Canada?", "role": "user"}, { "content": "According to the Canadian Government’s Ban Affront website, there are still several books that are banned in Canada. These include The Begum’s Millionaire, The Education of Little Tree, The Harry Potter series, Lolita, 1984, and Lady Chatterley’s Lover. Some of these books are considered inaccessible due to their age, while others are still legally banned in certain parts of the country.", "role": "assistant" } ], "score_chosen": 8.0, "score_rejected": 5.0 } 针对直接偏好优化(DPO)等技术,应使用`chosen`与`rejected`列;而`messages`列则适用于监督微调(SFT)或近端策略优化(PPO)任务。 ## 引用 若本数据集对你的研究有所帮助,请引用原始UltraFeedback数据集: @misc{cui2023ultrafeedback, title={UltraFeedback: Boosting Language Models with High-quality Feedback}, author={Ganqu Cui and Lifan Yuan and Ning Ding and Guanming Yao and Wei Zhu and Yuan Ni and Guotong Xie and Zhiyuan Liu and Maosong Sun}, year={2023}, eprint={2310.01377}, archivePrefix={arXiv}, primaryClass={cs.CL} }
提供机构:
maas
创建时间:
2024-06-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作