five

Magpie-Reasoning-V1-150K-CoT-Deepseek-R1-Llama-70B

收藏
魔搭社区2026-04-28 更新2025-02-01 收录
下载链接:
https://modelscope.cn/datasets/Magpie-Align/Magpie-Reasoning-V1-150K-CoT-Deepseek-R1-Llama-70B
下载链接
链接失效反馈
官方服务:
资源简介:
![Magpie](https://cdn-uploads.huggingface.co/production/uploads/653df1323479e9ebbe3eb6cc/FWWILXrAGNwWr52aghV0S.png) Project Web: [https://magpie-align.github.io/](https://magpie-align.github.io/) Arxiv Technical Report: [https://arxiv.org/abs/2406.08464](https://arxiv.org/abs/2406.08464) Codes: [https://github.com/magpie-align/magpie](https://github.com/magpie-align/magpie) ## Abstract <details><summary>Click Here</summary> High-quality instruction data is critical for aligning large language models (LLMs). Although some models, such as Llama-3-Instruct, have open weights, their alignment data remain private, which hinders the democratization of AI. High human labor costs and a limited, predefined scope for prompting prevent existing open-source data creation methods from scaling effectively, potentially limiting the diversity and quality of public alignment datasets. Is it possible to synthesize high-quality instruction data at scale by extracting it directly from an aligned LLM? We present a self-synthesis method for generating large-scale alignment data named Magpie. Our key observation is that aligned LLMs like Llama-3-Instruct can generate a user query when we input only the left-side templates up to the position reserved for user messages, thanks to their auto-regressive nature. We use this method to prompt Llama-3-Instruct and generate 4 million instructions along with their corresponding responses. We perform a comprehensive analysis of the extracted data and select 300K high-quality instances. To compare Magpie data with other public instruction datasets, we fine-tune Llama-3-8B-Base with each dataset and evaluate the performance of the fine-tuned models. Our results indicate that in some tasks, models fine-tuned with Magpie perform comparably to the official Llama-3-8B-Instruct, despite the latter being enhanced with 10 million data points through supervised fine-tuning (SFT) and subsequent feedback learning. We also show that using Magpie solely for SFT can surpass the performance of previous public datasets utilized for both SFT and preference optimization, such as direct preference optimization with UltraFeedback. This advantage is evident on alignment benchmarks such as AlpacaEval, ArenaHard, and WildBench. </details><be> <span style="color:red">🤨 News: Take a look on our new reasoning datasets with diverse CoT styles here!</span> - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Llama3](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Llama3) - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-QwQ](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-QwQ) - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Skywork-O1-Llama-3.1-8B](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Skywork-O1-Llama-3.1-8B) - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B) --- ## Dataset Details This dataset is generated by [Qwen2-72B-Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct) and [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) using [Magpie framework](https://huggingface.co/Magpie-Align). Specifically, the instructions are generated by Qwen2-72B-Instruct, and the responses are generated by DeepSeek-R1-Distill-Llama-70B. Please refer to our [paper](https://arxiv.org/abs/2406.08464) and [codebase](https://github.com/magpie-align/magpie) for implementation details. The motivation for developing this dataset is to augment the reasoning capabilities of our models through the utilization of high-quality instruction-response pairs. ## Instruction and Response Sources The instructions come from [Magpie-Align/Magpie-Reasoning-V1-150K](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V1-150K). Please refer to the corresponding dataset card for details. The responses are generated by [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B). Please note that for this release, **we do not apply any response filter**. If you are going to train your LLMs using this dataset, we recommend performing dataset filtering before training. ## License We release this dataset for research purpose only. Please follow [Tongyi Qianwen Lincense Agreement](https://github.com/QwenLM/Qwen/blob/main/Tongyi%20Qianwen%20LICENSE%20AGREEMENT) and [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/deed.en) for other usage. ## 📚 Citation If you find the model, data, or code useful, please cite our paper: ``` @article{xu2024magpie, title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin}, year={2024}, eprint={2406.08464}, archivePrefix={arXiv}, primaryClass={cs.CL} }

![喜鹊(Magpie)](https://cdn-uploads.huggingface.co/production/uploads/653df1323479e9ebbe3eb6cc/FWWILXrAGNwWr52aghV0S.png) 项目主页:[https://magpie-align.github.io/](https://magpie-align.github.io/) Arxiv技术报告:[https://arxiv.org/abs/2406.08464](https://arxiv.org/abs/2406.08464) 代码仓库:[https://github.com/magpie-align/magpie](https://github.com/magpie-align/magpie) ## 摘要 <details><summary>点击展开</summary> 高质量的指令数据对大语言模型(Large Language Model,LLM)的对齐任务至关重要。尽管部分模型(如Llama-3-Instruct)已开源权重,但其对齐数据仍处于私有状态,这阻碍了人工智能的民主化进程。当前开源数据构建方法面临人工成本高昂、提示范围预定义且有限的问题,难以实现有效扩展,进而可能限制公开对齐数据集的多样性与质量。我们能否直接从已对齐的大语言模型中提取信息,以大规模合成高质量的指令数据?为此,我们提出了一种用于生成大规模对齐数据的自合成方法,命名为Magpie(喜鹊)。我们的核心观察在于:得益于自回归特性,当仅输入用户消息预留位置之前的左侧模板时,已对齐的大语言模型(如Llama-3-Instruct)能够生成用户查询。我们利用该方法对Llama-3-Instruct进行提示,生成了400万条指令及其对应的响应。我们对提取得到的数据进行了全面分析,并从中筛选出30万个高质量实例。为了将Magpie数据集与其他公开指令数据集进行对比,我们分别使用各数据集对Llama-3-8B-Base进行微调,并评估微调后模型的性能。实验结果表明,在部分任务中,使用Magpie数据集微调得到的模型性能可与官方的Llama-3-8B-Instruct相媲美,尽管后者通过监督微调(Supervised Fine-Tuning,SFT)及后续反馈学习,使用了1000万条数据进行增强。我们还证实,仅使用Magpie数据集进行监督微调,其性能便可超越此前同时用于监督微调与偏好优化的公开数据集(如结合UltraFeedback的直接偏好优化)。这一优势在AlpacaEval、ArenaHard与WildBench等对齐基准测试中均有体现。 </details><br> <span style="color:red">🤨 消息:快来查看我们的多样化思维链(Chain of Thought,CoT)风格推理数据集!</span> - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Llama3](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Llama3) - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-QwQ](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-QwQ) - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Skywork-O1-Llama-3.1-8B](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Skywork-O1-Llama-3.1-8B) - [Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B) --- ## 数据集详情 本数据集由[Qwen2-72B-Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct)与[deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B)基于[Magpie框架(Magpie framework)](https://huggingface.co/Magpie-Align)生成。具体而言,指令由Qwen2-72B-Instruct生成,响应则由DeepSeek-R1-Distill-Llama-70B生成。有关实现细节,请参阅我们的[论文](https://arxiv.org/abs/2406.08464)与[代码库](https://github.com/magpie-align/magpie)。 开发本数据集的动机在于,通过利用高质量的指令-响应对,增强我们模型的推理能力。 ## 指令与响应来源 指令源自[Magpie-Align/Magpie-Reasoning-V1-150K](https://huggingface.co/datasets/Magpie-Align/Magpie-Reasoning-V1-150K),详细信息请参阅对应数据集卡片。 响应由[deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B)生成。请注意,本次发布的数据集**未应用任何响应过滤**。若您计划使用本数据集训练大语言模型,我们建议在训练前先进行数据集过滤。 ## 许可协议 本数据集仅用于研究目的发布。其他用途请遵循[通义千问许可协议(Tongyi Qianwen LICENSE AGREEMENT)](https://github.com/QwenLM/Qwen/blob/main/Tongyi%20Qianwen%20LICENSE%20AGREEMENT)与[知识共享署名-非商业性使用4.0国际许可协议(CC BY-NC 4.0)](https://creativecommons.org/licenses/by-nc/4.0/deed.en)。 ## 📚 引用 若您认为本模型、数据集或代码对您的工作有所帮助,请引用我们的论文: @article{xu2024magpie, title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing}, author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin}, year={2024}, eprint={2406.08464}, archivePrefix={arXiv}, primaryClass={cs.CL} }
提供机构:
maas
创建时间:
2025-01-28
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集是Magpie项目的一部分,旨在通过自我合成方法生成大规模、高质量的指令数据以增强大型语言模型的推理能力。具体来说,它包含150K个指令-响应对,其中指令由Qwen2-72B-Instruct生成,响应由DeepSeek-R1-Distill-Llama-70B生成,采用思维链(CoT)风格,但未经过滤,建议用户在训练前自行筛选。数据集遵循Apache License 2.0,但仅限研究使用,其他用途需遵守额外许可协议。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作