open-perfectblend
收藏魔搭社区2026-01-06 更新2024-10-26 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/open-perfectblend
下载链接
链接失效反馈官方服务:
资源简介:

# 🎨 Open-PerfectBlend
Open-PerfectBlend is an open-source reproduction of the instruction dataset introduced in the paper ["The Perfect Blend: Redefining RLHF with Mixture of Judges"](https://arxiv.org/abs/2409.20370).
It's a solid general-purpose instruction dataset with chat, math, code, and instruction-following data.
## Data source

Here is the list of the datasets used in this mix:
| Dataset | # Samples |
|------|------|
| [meta-math/MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA) | 395,000 |
| [openbmb/UltraInteract_sft](https://huggingface.co/datasets/openbmb/UltraInteract_sft) | 288,579 |
| [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) | 207,865 |
| [microsoft/orca-math-word-problems-200k](https://huggingface.co/datasets/microsoft/orca-math-word-problems-200k) | 200,035 |
| [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) | 187,405 |
| [theblackcat102/evol-codealpaca-v1](https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1) | 111,272 |
| [Post-training-Data-Flywheel/AutoIF-instruct-61k](https://huggingface.co/datasets/Post-training-Data-Flywheel/AutoIF-instruct-61k) | 61,492 |
| [mlabonne/lmsys-arena-human-preference-55k-sharegpt](https://huggingface.co/datasets/mlabonne/lmsys-arena-human-preference-55k-sharegpt) | 57,362 |
The deduplication process removed 88.1k samples across all datasets. All of these datasets use either an Apache 2.0 or MIT license.
Thanks to OpenBMB, MetaMath, Hugging Face, Microsoft, theblackcat102, Post-training-Data-Flywheel, and LMSYS for the data!
## Comparison
Here is the extract from the paper with the dataset mixture:

There are two main differences with the dataset described in the paper:
* Instruction-following data comes from another source because Meta didn't release their dataset.
* The harmful intent hasn't been released either, so I didn't add any data in this category.

# 🎨 Open-PerfectBlend 数据集
Open-PerfectBlend 是对论文《The Perfect Blend: 以评判者混合模型重新定义人类反馈强化学习(Reinforcement Learning with Human Feedback, RLHF)》(https://arxiv.org/abs/2409.20370)中提出的指令数据集的开源复现项目。该数据集为通用型高质量指令数据集,涵盖对话、数学、代码及指令遵循类数据。
## 数据源

本次数据集混合所采用的原始数据集列表如下:
| 数据集名称 | 样本数量 |
|------|------|
| [meta-math/MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA) | 395,000 |
| [openbmb/UltraInteract_sft](https://huggingface.co/datasets/openbmb/UltraInteract_sft) | 288,579 |
| [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) | 207,865 |
| [microsoft/orca-math-word-problems-200k](https://huggingface.co/datasets/microsoft/orca-math-word-problems-200k) | 200,035 |
| [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) | 187,405 |
| [theblackcat102/evol-codealpaca-v1](https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1) | 111,272 |
| [Post-training-Data-Flywheel/AutoIF-instruct-61k](https://huggingface.co/datasets/Post-training-Data-Flywheel/AutoIF-instruct-61k) | 61,492 |
| [mlabonne/lmsys-arena-human-preference-55k-sharegpt](https://huggingface.co/datasets/mlabonne/lmsys-arena-human-preference-55k-sharegpt) | 57,362 |
本次去重流程共移除了总计88.1k条样本,所有原始数据集均采用Apache 2.0或MIT开源许可协议。
在此感谢OpenBMB、MetaMath、Hugging Face、Microsoft、theblackcat102、Post-training-Data-Flywheel以及LMSYS提供相关数据集!
## 对比说明
以下为原论文中关于该数据集混合方案的摘录内容:

本复现数据集与原论文描述的原版数据集存在两处核心差异:
* 由于Meta并未公开其原始指令遵循数据集,本次复现采用了其他来源的同类数据;
* 原论文涉及的有害意图相关数据并未公开,因此本项目未添加该类别下的任何数据。
提供机构:
maas
创建时间:
2024-10-19



