Magpie-Reasoning-V2-250K-CoT-Llama3

Name: Magpie-Reasoning-V2-250K-CoT-Llama3
Creator: maas
Published: 2026-01-06 16:20:14
License: 暂无描述

魔搭社区2026-01-06 更新2025-01-18 收录

下载链接：

https://modelscope.cn/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Llama3

下载链接

链接失效反馈

官方服务：

资源简介：

![Magpie](https://cdn-uploads.huggingface.co/production/uploads/653df1323479e9ebbe3eb6cc/FWWILXrAGNwWr52aghV0S.png) Project Web: [https://magpie-align.github.io/](https://magpie-align.github.io/) Arxiv Technical Report: [https://arxiv.org/abs/2406.08464](https://arxiv.org/abs/2406.08464) Codes: [https://github.com/magpie-align/magpie](https://github.com/magpie-align/magpie) ## Abstract <details><summary>Click Here</summary> High-quality instruction data is critical for aligning large language models (LLMs). Although some models, such as Llama-3-Instruct, have open weights, their alignment data remain private, which hinders the democratization of AI. High human labor costs and a limited, predefined scope for prompting prevent existing open-source data creation methods from scaling effectively, potentially limiting the diversity and quality of public alignment datasets. Is it possible to synthesize high-quality instruction data at scale by extracting it directly from an aligned LLM? We present a self-synthesis method for generating large-scale alignment data named Magpie. Our key observation is that aligned LLMs like Llama-3-Instruct can generate a user query when we input only the left-side templates up to the position reserved for user messages, thanks to their auto-regressive nature. We use this method to prompt Llama-3-Instruct and generate 4 million instructions along with their corresponding responses. We perform a comprehensive analysis of the extracted data and select 300K high-quality instances. To compare Magpie data with other public instruction datasets, we fine-tune Llama-3-8B-Base with each dataset and evaluate the performance of the fine-tuned models. Our results indicate that in some tasks, models fine-tuned with Magpie perform comparably to the official Llama-3-8B-Instruct, despite the latter being enhanced with 10 million data points through supervised fine-tuning (SFT) and subsequent feedback learning. We also show that using Magpie solely for SFT can surpass the performance of previous public datasets utilized for both SFT and preference optimization, such as direct preference optimization with UltraFeedback. This advantage is evident on alignment benchmarks such as AlpacaEval, ArenaHard, and WildBench. </details><be> 🤨 Also take a look at our V1 (150K data) with new response generators here: - [Magpie-Align/Magpie-Reasoning-V1-150K](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V1-150K) (Llama3-70B-Instruct) - [Magpie-Align/Magpie-Reasoning-V1-150K-CoT-QwQ](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V1-150K-CoT-QwQ) (QwQ-32B-Preview) - [Magpie-Align/Magpie-Reasoning-V1-150K-CoT-Skywork-O1-Llama-3.1-8B](https://huggingface.co/datasets/Magpie-Align/Magpie-Align/Skywork-O1-Llama-3.1-8B) (Skywork-O1-Llama-3.1-8B) - [Magpie-Align/Magpie-Reasoning-V1-150K-CoT-Deepseek-R1-Llama-70B](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V1-150K-CoT-Deepseek-R1-Llama-70B) (Deepseek-R1-Llama-70B) 🤨 Take a look on more diverse CoT styles here! - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Llama3](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Llama3) [You're here!] - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-QwQ](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-QwQ) - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Skywork-O1-Llama-3.1-8B](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Skywork-O1-Llama-3.1-8B) - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B) --- ## Dataset Details This dataset contains instruction-response pairs generated by Meta's Llama 3.1 and 3.3 70B Instruct models using Magpie. Our filtering approach specifically targets **Chain-of-Thought (CoT) patterns** in both instructions and responses. We observed that Llama 3.1 and 3.3 Instruct models exhibit patterns of **overfitting to CoT-style data**. Specifically, when applying Magpie to extract instructions, we discovered CoT markers (e.g., "## Step 1") appearing within the extracted instructions themselves. This dataset represents a curated subset of the raw Magpie datasets, where we: - Filtered out raw instructions containing explicit CoT patterns (see `raw_instruction` column) - Truncated text before `## Step 1` to form instructions - Generated responses and retained those that demonstrate Llama-style Chain-of-Thought reasoning (e.g., with `## Step 1`) **Disclaimer**: The responses generated by the Llama models have not been validated for accuracy. As a result, model performance may vary across different tasks when trained on this dataset. **License**: Please follow [Meta Llama 3.1 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE) and [Meta Llama 3.3 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/LICENSE). ### Available Labels - **Raw Instruction**: The raw instruction generated by Magpie without any truncation. - **Input Length**: The total number of characters in the instructions. - **Output Length**: The total number of characters in the responses. - **Task Category**: The specific category of the instructions. - **Input Quality**: The clarity, specificity, and coherence of the instructions, rated as 'very poor', 'poor', 'average', 'good', and 'excellent'. - **Input Difficulty**: The level of knowledge required to address the task described in the instruction, rated as 'very easy', 'easy', 'medium', 'hard', or 'very hard'. - **Safety**: Safety tags marked by [meta-llama/Meta-Llama-Guard-2-8B](https://huggingface.co/meta-llama/Meta-Llama-Guard-2-8B) - **Reward**: The output of the reward model given the specific instruction-response pair. - **Language**: The language of the instruction. ## 📚 Citation If you find the model, data, or code useful, please cite our paper: ``` @article{xu2024magpie, title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin}, year={2024}, eprint={2406.08464}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```

![Magpie](https://cdn-uploads.huggingface.co/production/uploads/653df1323479e9ebbe3eb6cc/FWWILXrAGNwWr52aghV0S.png) 项目主页：[https://magpie-align.github.io/](https://magpie-align.github.io/) arXiv技术报告：[https://arxiv.org/abs/2406.08464](https://arxiv.org/abs/2406.08464) 代码仓库：[https://github.com/magpie-align/magpie](https://github.com/magpie-align/magpie) ## 摘要 <details><summary>点击展开</summary> 高质量的指令数据对大语言模型（Large Language Model, LLM）的对齐至关重要。尽管Llama-3-Instruct等部分模型已开放权重，但其对齐数据仍处于私有状态，这阻碍了人工智能的民主化进程。现有开源数据构建方法面临人工成本高昂、提示范围预定义且有限的问题，难以有效扩展，进而可能限制了公开对齐数据集的多样性与质量。能否直接从已对齐的大语言模型中提取数据，从而规模化合成高质量的指令数据？为此，我们提出了一种用于生成大规模对齐数据的自合成方法，命名为Magpie。我们的核心观察是：得益于自回归特性，像Llama-3-Instruct这样的已对齐大语言模型，仅需输入到用户消息预留位置之前的左侧模板，即可生成用户查询。我们利用该方法对Llama-3-Instruct进行提示，生成了400万条指令及其对应的响应。我们对提取得到的数据进行了全面分析，并筛选出30万个高质量样本。为了将Magpie数据集与其他公开指令数据集进行对比，我们分别使用各数据集对Llama-3-8B-Base进行微调，并评估微调后模型的性能。结果表明，在部分任务中，使用Magpie数据微调的模型性能可与官方Llama-3-8B-Instruct相媲美——尽管后者通过监督微调（Supervised Fine-Tuning, SFT）及后续反馈学习，使用了1000万条数据进行增强。我们还证实，仅使用Magpie数据进行监督微调，其性能可超越此前同时用于监督微调与偏好优化的公开数据集，例如结合UltraFeedback的直接偏好优化（Direct Preference Optimization, DPO）方法。该优势在AlpacaEval、ArenaHard及WildBench等对齐基准测试中均有体现。 </details> 🤨 同时可查看我们的V1版本（15万条数据），其配套的新型响应生成器如下： - [Magpie-Align/Magpie-Reasoning-V1-150K](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V1-150K)（基于Llama3-70B-Instruct） - [Magpie-Align/Magpie-Reasoning-V1-150K-CoT-QwQ](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V1-150K-CoT-QwQ)（基于QwQ-32B-Preview） - [Magpie-Align/Magpie-Reasoning-V1-150K-CoT-Skywork-O1-Llama-3.1-8B](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V1-150K-CoT-Skywork-O1-Llama-3.1-8B)（基于Skywork-O1-Llama-3.1-8B） - [Magpie-Align/Magpie-Reasoning-V1-150K-CoT-Deepseek-R1-Llama-70B](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V1-150K-CoT-Deepseek-R1-Llama-70B)（基于Deepseek-R1-Llama-70B） 🤨 在此处查看更多样化的思维链（Chain-of-Thought, CoT）风格数据集！ - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Llama3](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Llama3) [您当前所在的数据集！] - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-QwQ](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-QwQ) - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Skywork-O1-Llama-3.1-8B](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Skywork-O1-Llama-3.1-8B) - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B) --- ## 数据集详情本数据集包含由Meta的Llama 3.1及3.3 70B Instruct模型通过Magpie方法生成的指令-响应对。我们的筛选策略专门针对指令与响应中的**思维链（Chain-of-Thought, CoT）模式**。我们观察到，Llama 3.1与3.3 Instruct模型存在对思维链风格数据的过拟合倾向。具体而言，在使用Magpie方法提取指令时，我们发现提取出的指令本身包含思维链标记（例如"## Step 1"）。本数据集是原始Magpie数据集的精选子集，我们的处理步骤如下： - 过滤掉包含显式思维链模式的原始指令（详见`raw_instruction`字段） - 在`## Step 1`之前截断文本以形成最终指令 - 生成响应并保留那些体现Llama风格思维链推理的结果（例如包含`## Step 1`的响应） **免责声明**：Llama模型生成的响应未经过准确性验证，因此基于本数据集训练的模型在不同任务上的性能可能存在差异。 **许可证**：请遵守[Meta Llama 3.1社区许可证](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE)与[Meta Llama 3.3社区许可证](https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/LICENSE)。 ### 可用字段 - **原始指令**：未经过任何截断处理的、由Magpie生成的原始指令 - **输入长度**：指令的总字符数 - **输出长度**：响应的总字符数 - **任务类别**：指令所属的具体任务分类 - **输入质量**：指令的清晰度、特异性与连贯性，评级分为"极差""较差""一般""良好"与"优秀" - **输入难度**：指令描述的任务所需的知识水平，评级分为"极简单""简单""中等""困难"与"极困难" - **安全性**：由[meta-llama/Meta-Llama-Guard-2-8B](https://huggingface.co/meta-llama/Meta-Llama-Guard-2-8B)标记的安全标签 - **奖励得分**：针对特定指令-响应对的奖励模型输出结果 - **语言**：指令所使用的语言 ## 📚 引用若您认为本模型、数据集或代码对您的工作有所帮助，请引用我们的论文： @article{xu2024magpie, title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin}, year={2024}, eprint={2406.08464}, archivePrefix={arXiv}, primaryClass={cs.CL} }

提供机构：

maas

创建时间：

2025-01-15

5,000+

优质数据集

54 个

任务类型

进入经典数据集