five

open-perfectblend

收藏
魔搭社区2026-01-06 更新2024-10-26 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/open-perfectblend
下载链接
链接失效反馈
官方服务:
资源简介:
![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/O0draAVUeUZI9qRMglywA.png) # 🎨 Open-PerfectBlend Open-PerfectBlend is an open-source reproduction of the instruction dataset introduced in the paper ["The Perfect Blend: Redefining RLHF with Mixture of Judges"](https://arxiv.org/abs/2409.20370). It's a solid general-purpose instruction dataset with chat, math, code, and instruction-following data. ## Data source ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/rQ7db032OcjTZ2i2cpvY7.png) Here is the list of the datasets used in this mix: | Dataset | # Samples | |------|------| | [meta-math/MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA) | 395,000 | | [openbmb/UltraInteract_sft](https://huggingface.co/datasets/openbmb/UltraInteract_sft) | 288,579 | | [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) | 207,865 | | [microsoft/orca-math-word-problems-200k](https://huggingface.co/datasets/microsoft/orca-math-word-problems-200k) | 200,035 | | [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) | 187,405 | | [theblackcat102/evol-codealpaca-v1](https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1) | 111,272 | | [Post-training-Data-Flywheel/AutoIF-instruct-61k](https://huggingface.co/datasets/Post-training-Data-Flywheel/AutoIF-instruct-61k) | 61,492 | | [mlabonne/lmsys-arena-human-preference-55k-sharegpt](https://huggingface.co/datasets/mlabonne/lmsys-arena-human-preference-55k-sharegpt) | 57,362 | The deduplication process removed 88.1k samples across all datasets. All of these datasets use either an Apache 2.0 or MIT license. Thanks to OpenBMB, MetaMath, Hugging Face, Microsoft, theblackcat102, Post-training-Data-Flywheel, and LMSYS for the data! ## Comparison Here is the extract from the paper with the dataset mixture: ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/QObNW7erIVb_WM8C3hxDo.png) There are two main differences with the dataset described in the paper: * Instruction-following data comes from another source because Meta didn't release their dataset. * The harmful intent hasn't been released either, so I didn't add any data in this category.

![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/O0draAVUeUZI9qRMglywA.png) # 🎨 Open-PerfectBlend 数据集 Open-PerfectBlend 是对论文《The Perfect Blend: 以评判者混合模型重新定义人类反馈强化学习(Reinforcement Learning with Human Feedback, RLHF)》(https://arxiv.org/abs/2409.20370)中提出的指令数据集的开源复现项目。该数据集为通用型高质量指令数据集,涵盖对话、数学、代码及指令遵循类数据。 ## 数据源 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/rQ7db032OcjTZ2i2cpvY7.png) 本次数据集混合所采用的原始数据集列表如下: | 数据集名称 | 样本数量 | |------|------| | [meta-math/MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA) | 395,000 | | [openbmb/UltraInteract_sft](https://huggingface.co/datasets/openbmb/UltraInteract_sft) | 288,579 | | [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) | 207,865 | | [microsoft/orca-math-word-problems-200k](https://huggingface.co/datasets/microsoft/orca-math-word-problems-200k) | 200,035 | | [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) | 187,405 | | [theblackcat102/evol-codealpaca-v1](https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1) | 111,272 | | [Post-training-Data-Flywheel/AutoIF-instruct-61k](https://huggingface.co/datasets/Post-training-Data-Flywheel/AutoIF-instruct-61k) | 61,492 | | [mlabonne/lmsys-arena-human-preference-55k-sharegpt](https://huggingface.co/datasets/mlabonne/lmsys-arena-human-preference-55k-sharegpt) | 57,362 | 本次去重流程共移除了总计88.1k条样本,所有原始数据集均采用Apache 2.0或MIT开源许可协议。 在此感谢OpenBMB、MetaMath、Hugging Face、Microsoft、theblackcat102、Post-training-Data-Flywheel以及LMSYS提供相关数据集! ## 对比说明 以下为原论文中关于该数据集混合方案的摘录内容: ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/QObNW7erIVb_WM8C3hxDo.png) 本复现数据集与原论文描述的原版数据集存在两处核心差异: * 由于Meta并未公开其原始指令遵循数据集,本次复现采用了其他来源的同类数据; * 原论文涉及的有害意图相关数据并未公开,因此本项目未添加该类别下的任何数据。
提供机构:
maas
创建时间:
2024-10-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作