Magpie-Reasoning-V2-250K-CoT-QwQ

Name: Magpie-Reasoning-V2-250K-CoT-QwQ
Creator: maas
Published: 2026-01-02 16:20:36
License: 暂无描述

魔搭社区2026-01-02 更新2025-01-18 收录

下载链接：

https://modelscope.cn/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-QwQ

下载链接

链接失效反馈

官方服务：

资源简介：

![Magpie](https://cdn-uploads.huggingface.co/production/uploads/653df1323479e9ebbe3eb6cc/FWWILXrAGNwWr52aghV0S.png) Project Web: [https://magpie-align.github.io/](https://magpie-align.github.io/) Arxiv Technical Report: [https://arxiv.org/abs/2406.08464](https://arxiv.org/abs/2406.08464) Codes: [https://github.com/magpie-align/magpie](https://github.com/magpie-align/magpie) ## Abstract <details><summary>Click Here</summary> High-quality instruction data is critical for aligning large language models (LLMs). Although some models, such as Llama-3-Instruct, have open weights, their alignment data remain private, which hinders the democratization of AI. High human labor costs and a limited, predefined scope for prompting prevent existing open-source data creation methods from scaling effectively, potentially limiting the diversity and quality of public alignment datasets. Is it possible to synthesize high-quality instruction data at scale by extracting it directly from an aligned LLM? We present a self-synthesis method for generating large-scale alignment data named Magpie. Our key observation is that aligned LLMs like Llama-3-Instruct can generate a user query when we input only the left-side templates up to the position reserved for user messages, thanks to their auto-regressive nature. We use this method to prompt Llama-3-Instruct and generate 4 million instructions along with their corresponding responses. We perform a comprehensive analysis of the extracted data and select 300K high-quality instances. To compare Magpie data with other public instruction datasets, we fine-tune Llama-3-8B-Base with each dataset and evaluate the performance of the fine-tuned models. Our results indicate that in some tasks, models fine-tuned with Magpie perform comparably to the official Llama-3-8B-Instruct, despite the latter being enhanced with 10 million data points through supervised fine-tuning (SFT) and subsequent feedback learning. We also show that using Magpie solely for SFT can surpass the performance of previous public datasets utilized for both SFT and preference optimization, such as direct preference optimization with UltraFeedback. This advantage is evident on alignment benchmarks such as AlpacaEval, ArenaHard, and WildBench. </details><be> 🤨 Also take a look at our V1 (150K data) with new response generators here: - [Magpie-Align/Magpie-Reasoning-V1-150K](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V1-150K) (Llama3-70B-Instruct) - [Magpie-Align/Magpie-Reasoning-V1-150K-CoT-QwQ](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V1-150K-CoT-QwQ) (QwQ-32B-Preview) - [Magpie-Align/Magpie-Reasoning-V1-150K-CoT-Skywork-O1-Llama-3.1-8B](https://huggingface.co/datasets/Magpie-Align/Magpie-Align/Skywork-O1-Llama-3.1-8B) (Skywork-O1-Llama-3.1-8B) - [Magpie-Align/Magpie-Reasoning-V1-150K-CoT-Deepseek-R1-Llama-70B](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V1-150K-CoT-Deepseek-R1-Llama-70B) (Deepseek-R1-Llama-70B) <span style="color:red">🤨 Take a look on more diverse CoT styles here!</span> - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Llama3](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Llama3) - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-QwQ](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-QwQ) [You're here!] - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Skywork-O1-Llama-3.1-8B](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Skywork-O1-Llama-3.1-8B) - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B) --- ## Dataset Details This dataset is generated by [Meta's Llama 3.1 70B Instruct](meta-llama/Llama-3.1-70B-Instruct), [Llama 3.3 70B Instruct](meta-llama/Llama-3.3-70B-Instruct) and [QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview) using [Magpie framework](https://huggingface.co/Magpie-Align). Specifically, the instructions are generated by Llama 3.1 70B Instruct and Llama 3.3 70B Instruct, and the responses are generated by QwQ-32B-Preview. Please refer to our [paper](https://arxiv.org/abs/2406.08464) and [codebase](https://github.com/magpie-align/magpie) for implementation details. The motivation for developing this dataset is to augment the reasoning capabilities of our models through the utilization of high-quality instruction-response pairs. ## Instruction and Response Sources The instructions come from [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Llama3](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Llama3). Please refer to the corresponding dataset card for details. The responses are generated by [QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview). Please note that for this release, **we do not apply any response filter**. If you are going to train your LLMs using this dataset, we recommend performing dataset filtering before training. ## License We release this dataset for research purpose only. For other usage, please follow: - [Meta Llama 3.1 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE), - [Meta Llama 3.3 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/LICENSE), - [Tongyi Qianwen Lincense Agreement](https://github.com/QwenLM/Qwen/blob/main/Tongyi%20Qianwen%20LICENSE%20AGREEMENT), and - [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/deed.en). ## 📚 Citation If you find the model, data, or code useful, please cite our paper: ``` @article{xu2024magpie, title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin}, year={2024}, eprint={2406.08464}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```

![喜鹊（Magpie）](https://cdn-uploads.huggingface.co/production/uploads/653df1323479e9ebbe3eb6cc/FWWILXrAGNwWr52aghV0S.png) 项目主页：[https://magpie-align.github.io/](https://magpie-align.github.io/) Arxiv技术报告：[https://arxiv.org/abs/2406.08464](https://arxiv.org/abs/2406.08464) 代码仓库：[https://github.com/magpie-align/magpie](https://github.com/magpie-align/magpie) ## 摘要 <details><summary>点击展开</summary> 高质量指令数据对于对齐大语言模型（Large Language Model, LLM）至关重要。尽管部分模型（如Llama-3-Instruct）已开放权重，但其对齐数据仍处于私有状态，这阻碍了人工智能的民主化进程。现有开源数据构建方法面临人工成本高昂、提示范围预定义且有限的问题，难以实现有效扩展，进而可能限制公开对齐数据集的多样性与质量。能否通过直接从已对齐的大语言模型中提取数据，规模化合成高质量指令数据？我们提出了一种用于生成大规模对齐数据的自合成方法，命名为Magpie。我们的核心观察是：得益于自回归特性，仅向Llama-3-Instruct等已对齐模型输入用户消息预留位置之前的左侧模板，模型即可生成用户查询。我们利用该方法对Llama-3-Instruct进行提示，生成了400万条指令及其对应的响应。我们对提取的数据进行了全面分析，并筛选出30万个高质量样本。为了将Magpie数据集与其他公开指令数据集进行对比，我们使用每个数据集分别微调Llama-3-8B-Base，并评估微调后模型的性能。结果表明，在部分任务中，使用Magpie数据集微调的模型性能可与官方Llama-3-8B-Instruct相媲美——尽管后者通过监督微调（Supervised Fine-Tuning, SFT）与后续反馈学习，使用了1000万条数据进行增强。我们还证明，仅使用Magpie数据集进行监督微调，即可超越此前用于监督微调与偏好优化（如结合UltraFeedback的直接偏好优化）的公开数据集的性能。这一优势在AlpacaEval、ArenaHard与WildBench等对齐基准测试中均有体现。 </details> 🤨 亦可查看我们的V1版本（15万条数据），配套全新响应生成器： - [Magpie-Align/Magpie-Reasoning-V1-150K](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V1-150K)（基于Llama3-70B-Instruct） - [Magpie-Align/Magpie-Reasoning-V1-150K-CoT-QwQ](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V1-150K-CoT-QwQ)（基于QwQ-32B-Preview） - [Magpie-Align/Magpie-Reasoning-V1-150K-CoT-Skywork-O1-Llama-3.1-8B](https://huggingface.co/datasets/Magpie-Align/Magpie-Align/Skywork-O1-Llama-3.1-8B)（基于Skywork-O1-Llama-3.1-8B） - [Magpie-Align/Magpie-Reasoning-V1-150K-CoT-Deepseek-R1-Llama-70B](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V1-150K-CoT-Deepseek-R1-Llama-70B)（基于Deepseek-R1-Llama-70B） <span style="color:red">🤨 在此探索更多样化的思维链（Chain of Thought, CoT）风格数据集！</span> - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Llama3](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Llama3) - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-QwQ](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-QwQ)【您当前所在的数据集！】 - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Skywork-O1-Llama-3.1-8B](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Skywork-O1-Llama-3.1-8B) - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B) --- ## 数据集详情本数据集由[Meta的Llama 3.1 70B Instruct](meta-llama/Llama-3.1-70B-Instruct)、[Llama 3.3 70B Instruct](meta-llama/Llama-3.3-70B-Instruct)与[QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview)通过[Magpie框架](https://huggingface.co/Magpie-Align)生成。具体而言，指令由Llama 3.1 70B Instruct与Llama 3.3 70B Instruct生成，响应则由QwQ-32B-Preview生成。有关实现细节，请参阅我们的[论文](https://arxiv.org/abs/2406.08464)与[代码仓库](https://github.com/magpie-align/magpie)。开发本数据集的动机在于，通过利用高质量的指令-响应对，增强模型的推理能力。 ## 指令与响应来源指令源自[Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Llama3](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Llama3)，详细信息请参阅对应数据集卡片。响应由[QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview)生成。请注意，本次发布的数据集**未应用任何响应筛选步骤**。若您计划使用本数据集训练大语言模型，我们建议在训练前先对数据集进行筛选。 ## 许可协议我们仅将本数据集用于研究目的发布。如需其他用途，请遵循以下协议： - [Meta Llama 3.1 社区许可协议](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE) - [Meta Llama 3.3 社区许可协议](https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/LICENSE) - [通义千问许可协议](https://github.com/QwenLM/Qwen/blob/main/Tongyi%20Qianwen%20LICENSE%20AGREEMENT) - [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/deed.en) ## 📚 引用若您认为本模型、数据集或代码对您的研究有所帮助，请引用我们的论文： @article{xu2024magpie, title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin}, year={2024}, eprint={2406.08464}, archivePrefix={arXiv}, primaryClass={cs.CL} }

提供机构：

maas

创建时间：

2025-01-15

搜集汇总

数据集介绍

背景与挑战

背景概述

Magpie-Reasoning-V2-250K-CoT-QwQ是一个用于增强模型推理能力的高质量指令数据集，通过Magpie框架生成，其中指令由Llama 3.1 70B Instruct和Llama 3.3 70B Instruct提供，响应由QwQ-32B-Preview生成，且未经过滤。该数据集以研究目的发布，需遵循相关开源许可证。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集