open-perfectblend

Name: open-perfectblend
Creator: maas
Published: 2026-01-06 16:17:13
License: 暂无描述

魔搭社区2026-01-06 更新2024-10-26 收录

下载链接：

https://modelscope.cn/datasets/AI-ModelScope/open-perfectblend

下载链接

链接失效反馈

官方服务：

资源简介：

![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/O0draAVUeUZI9qRMglywA.png) # 🎨 Open-PerfectBlend Open-PerfectBlend is an open-source reproduction of the instruction dataset introduced in the paper ["The Perfect Blend: Redefining RLHF with Mixture of Judges"](https://arxiv.org/abs/2409.20370). It's a solid general-purpose instruction dataset with chat, math, code, and instruction-following data. ## Data source ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/rQ7db032OcjTZ2i2cpvY7.png) Here is the list of the datasets used in this mix: | Dataset | # Samples | |------|------| | [meta-math/MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA) | 395,000 | | [openbmb/UltraInteract_sft](https://huggingface.co/datasets/openbmb/UltraInteract_sft) | 288,579 | | [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) | 207,865 | | [microsoft/orca-math-word-problems-200k](https://huggingface.co/datasets/microsoft/orca-math-word-problems-200k) | 200,035 | | [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) | 187,405 | | [theblackcat102/evol-codealpaca-v1](https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1) | 111,272 | | [Post-training-Data-Flywheel/AutoIF-instruct-61k](https://huggingface.co/datasets/Post-training-Data-Flywheel/AutoIF-instruct-61k) | 61,492 | | [mlabonne/lmsys-arena-human-preference-55k-sharegpt](https://huggingface.co/datasets/mlabonne/lmsys-arena-human-preference-55k-sharegpt) | 57,362 | The deduplication process removed 88.1k samples across all datasets. All of these datasets use either an Apache 2.0 or MIT license. Thanks to OpenBMB, MetaMath, Hugging Face, Microsoft, theblackcat102, Post-training-Data-Flywheel, and LMSYS for the data! ## Comparison Here is the extract from the paper with the dataset mixture: ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/QObNW7erIVb_WM8C3hxDo.png) There are two main differences with the dataset described in the paper: * Instruction-following data comes from another source because Meta didn't release their dataset. * The harmful intent hasn't been released either, so I didn't add any data in this category.

![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/O0draAVUeUZI9qRMglywA.png) # 🎨 Open-PerfectBlend 数据集 Open-PerfectBlend 是对论文《The Perfect Blend: 以评判者混合模型重新定义人类反馈强化学习（Reinforcement Learning with Human Feedback, RLHF）》（https://arxiv.org/abs/2409.20370）中提出的指令数据集的开源复现项目。该数据集为通用型高质量指令数据集，涵盖对话、数学、代码及指令遵循类数据。 ## 数据源 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/rQ7db032OcjTZ2i2cpvY7.png) 本次数据集混合所采用的原始数据集列表如下： | 数据集名称 | 样本数量 | |------|------| | [meta-math/MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA) | 395,000 | | [openbmb/UltraInteract_sft](https://huggingface.co/datasets/openbmb/UltraInteract_sft) | 288,579 | | [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) | 207,865 | | [microsoft/orca-math-word-problems-200k](https://huggingface.co/datasets/microsoft/orca-math-word-problems-200k) | 200,035 | | [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) | 187,405 | | [theblackcat102/evol-codealpaca-v1](https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1) | 111,272 | | [Post-training-Data-Flywheel/AutoIF-instruct-61k](https://huggingface.co/datasets/Post-training-Data-Flywheel/AutoIF-instruct-61k) | 61,492 | | [mlabonne/lmsys-arena-human-preference-55k-sharegpt](https://huggingface.co/datasets/mlabonne/lmsys-arena-human-preference-55k-sharegpt) | 57,362 | 本次去重流程共移除了总计88.1k条样本，所有原始数据集均采用Apache 2.0或MIT开源许可协议。在此感谢OpenBMB、MetaMath、Hugging Face、Microsoft、theblackcat102、Post-training-Data-Flywheel以及LMSYS提供相关数据集！ ## 对比说明以下为原论文中关于该数据集混合方案的摘录内容： ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/QObNW7erIVb_WM8C3hxDo.png) 本复现数据集与原论文描述的原版数据集存在两处核心差异： * 由于Meta并未公开其原始指令遵循数据集，本次复现采用了其他来源的同类数据； * 原论文涉及的有害意图相关数据并未公开，因此本项目未添加该类别下的任何数据。

提供机构：

maas

创建时间：

2024-10-19

搜集汇总

数据集介绍

背景与挑战

背景概述

Open-PerfectBlend 是一个开源复现的指令数据集，基于论文'The Perfect Blend: Redefining RLHF with Mixture of Judges'构建，融合了聊天、数学、代码和指令遵循等多种数据源。它通过整合多个公开数据集并经过去重处理而成，适用于通用目的的训练任务。

以上内容由遇见数据集搜集并总结生成