Magpie-Reasoning-V1-150K

Name: Magpie-Reasoning-V1-150K
Creator: maas
Published: 2026-01-02 16:20:34
License: 暂无描述

魔搭社区2026-01-02 更新2025-01-18 收录

下载链接：

https://modelscope.cn/datasets/Magpie-Align/Magpie-Reasoning-V1-150K

下载链接

链接失效反馈

官方服务：

资源简介：

![Magpie](https://cdn-uploads.huggingface.co/production/uploads/653df1323479e9ebbe3eb6cc/FWWILXrAGNwWr52aghV0S.png) Project Web: [https://magpie-align.github.io/](https://magpie-align.github.io/) Arxiv Technical Report: [https://arxiv.org/abs/2406.08464](https://arxiv.org/abs/2406.08464) Codes: [https://github.com/magpie-align/magpie](https://github.com/magpie-align/magpie) ## Abstract <details><summary>Click Here</summary> High-quality instruction data is critical for aligning large language models (LLMs). Although some models, such as Llama-3-Instruct, have open weights, their alignment data remain private, which hinders the democratization of AI. High human labor costs and a limited, predefined scope for prompting prevent existing open-source data creation methods from scaling effectively, potentially limiting the diversity and quality of public alignment datasets. Is it possible to synthesize high-quality instruction data at scale by extracting it directly from an aligned LLM? We present a self-synthesis method for generating large-scale alignment data named Magpie. Our key observation is that aligned LLMs like Llama-3-Instruct can generate a user query when we input only the left-side templates up to the position reserved for user messages, thanks to their auto-regressive nature. We use this method to prompt Llama-3-Instruct and generate 4 million instructions along with their corresponding responses. We perform a comprehensive analysis of the extracted data and select 300K high-quality instances. To compare Magpie data with other public instruction datasets, we fine-tune Llama-3-8B-Base with each dataset and evaluate the performance of the fine-tuned models. Our results indicate that in some tasks, models fine-tuned with Magpie perform comparably to the official Llama-3-8B-Instruct, despite the latter being enhanced with 10 million data points through supervised fine-tuning (SFT) and subsequent feedback learning. We also show that using Magpie solely for SFT can surpass the performance of previous public datasets utilized for both SFT and preference optimization, such as direct preference optimization with UltraFeedback. This advantage is evident on alignment benchmarks such as AlpacaEval, ArenaHard, and WildBench. </details><be> <span style="color:red">🤨 News: Take a look on our new reasoning datasets with diverse CoT styles here!</span> - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Llama3](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Llama3) - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-QwQ](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-QwQ) - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Skywork-O1-Llama-3.1-8B](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Skywork-O1-Llama-3.1-8B) - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B) --- ## Dataset Details This dataset is generated by [Qwen2-72B-Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct) and [Llama 3 70B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) using [Magpie](https://huggingface.co/Magpie-Align). Specifically, the instructions are generated by Qwen2-72B-Instruct, and the responses are generated by Llama 3 70B Instruct. Please refer to our [paper](https://arxiv.org/abs/2406.08464) and [codebase](https://github.com/magpie-align/magpie) for implementation details. The motivation for developing this dataset is to augment the reasoning capabilities of our models through the utilization of high-quality instruction-response pairs. You can find the model SFT checkpoint fine-tuned using this dataset [here](https://huggingface.co/Magpie-Align/Llama-3-8B-Magpie-Align-SFT-v0.2). ## Filter Setups - **Input Quality**: >= good - **Input Difficulty**: >= easy - **Task Category**: Reasoning, Math, Coding & Debugging - **Instruction Reward**: >=-10 - **Language**: English - Remove repetition and incomplete instructions (e.g., end with :) - Choose 150K data with the longest responses ## License Please follow [Meta Llama 3 Community License](https://llama.meta.com/llama3/license/), [Tongyi Qianwen Lincense Agreement](https://github.com/QwenLM/Qwen/blob/main/Tongyi%20Qianwen%20LICENSE%20AGREEMENT) and [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/deed.en). ## 📚 Citation If you find the model, data, or code useful, please cite our paper: ``` @article{xu2024magpie, title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin}, year={2024}, eprint={2406.08464}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```

![喜鹊（Magpie）](https://cdn-uploads.huggingface.co/production/uploads/653df1323479e9ebbe3eb6cc/FWWILXrAGNwWr52aghV0S.png) 项目网页：[https://magpie-align.github.io/](https://magpie-align.github.io/) Arxiv技术报告：[https://arxiv.org/abs/2406.08464](https://arxiv.org/abs/2406.08464) 代码仓库：[https://github.com/magpie-align/magpie](https://github.com/magpie-align/magpie) ## 摘要 <details><summary>点击此处展开</summary> 高质量指令数据对大语言模型（Large Language Model, LLM）的对齐工作至关重要。尽管诸如Llama-3-Instruct等部分模型已开放权重，但其对齐数据仍处于私有状态，这阻碍了人工智能的民主化进程。高昂的人力成本与受限的预定义提示范围，使得现有开源数据构建方法难以有效规模化扩展，进而可能制约了公开对齐数据集的多样性与质量。能否直接从已对齐的大语言模型中提取并规模化生成高质量指令数据？为此我们提出了一种用于规模化生成对齐数据的自合成方法，命名为Magpie（喜鹊）。我们的核心观察在于：得益于自回归特性，诸如Llama-3-Instruct这类已对齐的大语言模型，仅需输入至用户消息预留位置的左侧模板，即可生成用户查询。我们借助该方法对Llama-3-Instruct进行提示，生成了400万条指令及其对应的回复。我们对提取得到的数据进行了全面分析，并从中筛选出30万个高质量样本。为将Magpie数据集与其他公开指令数据集进行对比，我们分别使用各数据集对Llama-3-8B-Base进行微调，并评估微调后模型的性能。实验结果表明，在部分任务中，使用Magpie微调得到的模型性能可与官方Llama-3-8B-Instruct相媲美——尽管后者通过1000万条数据的监督微调（Supervised Fine-Tuning, SFT）与后续反馈学习完成了性能增强。我们还证实，仅使用Magpie进行监督微调，即可超越此前同时用于监督微调与偏好优化的公开数据集的表现，例如结合UltraFeedback的直接偏好优化方法。该优势在AlpacaEval、ArenaHard与WildBench等对齐基准测试中均有体现。 </details> <span style="color:red">🤨 新闻：在此处查看我们的多样化思维链（Chain of Thought, CoT）风格推理数据集！</span> - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Llama3](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Llama3) - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-QwQ](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-QwQ) - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Skywork-O1-Llama-3.1-8B](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Skywork-O1-Llama-3.1-8B) - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B) --- ## 数据集详情本数据集由[Qwen2-72B-Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct)与[Llama 3 70B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct)借助[Magpie](https://huggingface.co/Magpie-Align)生成。具体而言，指令部分由Qwen2-72B-Instruct生成，回复部分则由Llama 3 70B Instruct生成。有关实现细节，请参阅我们的[论文](https://arxiv.org/abs/2406.08464)与[代码仓库](https://github.com/magpie-align/magpie)。本数据集的构建动机，在于通过高质量的指令-回复对增强模型的推理能力。你可在此处获取使用本数据集微调得到的监督微调模型权重：[Magpie-Align/Llama-3-8B-Magpie-Align-SFT-v0.2](https://huggingface.co/Magpie-Align/Llama-3-8B-Magpie-Align-SFT-v0.2)。 ## 过滤设置 - **输入质量**：≥ 良好 - **输入难度**：≥ 简单 - **任务类别**：推理、数学、编码与调试 - **指令奖励值**：≥ -10 - **语言**：英语 - 移除重复与不完整的指令（例如以冒号结尾的指令） - 选取15万个回复最长的样本 ## 许可证请遵守[Meta Llama 3社区许可证](https://llama.meta.com/llama3/license/)、[通义千问许可协议](https://github.com/QwenLM/Qwen/blob/main/Tongyi%20Qianwen%20LICENSE%20AGREEMENT)与[CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/deed.en)。 ## 📚 引用若您认为本模型、数据集或代码对您的工作有所帮助，请引用我们的论文： @article{xu2024magpie, title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin}, year={2024}, eprint={2406.08464}, archivePrefix={arXiv}, primaryClass={cs.CL} }

提供机构：

maas

创建时间：

2025-01-15

5,000+

优质数据集

54 个

任务类型

进入经典数据集