llama-3.1-tulu-3-405b-preference-mixture

Name: llama-3.1-tulu-3-405b-preference-mixture
Creator: maas
Published: 2025-12-05 16:36:36
License: 暂无描述

魔搭社区2025-12-05 更新2025-05-31 收录

下载链接：

https://modelscope.cn/datasets/allenai/llama-3.1-tulu-3-405b-preference-mixture

下载链接

链接失效反馈

官方服务：

资源简介：

<img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu-3/Tulu3-logo.png" alt="Tulu3 banner" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/> # Llama 3.1 Tulu 3 405B Preference Mixture *Note that this collection is licensed under ODC-BY-1.0 license; different licenses apply to subsets of the data. Some portions of the dataset are non-commercial. We present the mixture as a research artifact.* This preference mixture used for DPO on our the [Llama 3.1 Tulu 3 405B SFT](https://huggingface.co/allenai/Llama-3.1-Tulu-3-405B-SFT) checkpoint to obtain [Llama 3.1 Tulu 3 405B DPO](https://huggingface.co/allenai/Llama-3.1-Tulu-3-405B-DPO). It contains 360,924 generation pairs obtained using the following models: - [Mistral 7B Instruct v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) (Apache 2.0) - [Mistral Nemo Instruct 2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) (Apache 2.0) - [Tulu 2 7B](https://huggingface.co/allenai/tulu-2-7b) (Ai2 ImpACT Low Risk License) - [Tulu 2 13B](https://huggingface.co/allenai/tulu-2-13b) (Ai2 ImpACT Low Risk License) - [Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat) (Apache 2.0) - [Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat) (Apache 2.0) - [MPT 30B Chat](https://huggingface.co/mosaicml/mpt-30b-chat) (CC-BY-SA-4.0) - [MPT 7B 8k Chat](https://huggingface.co/mosaicml/mpt-7b-8k-chat) (CC-BY-SA-4.0) - [Google Gemma 2 27B it](https://huggingface.co/google/gemma-2-27b-it) (Gemma is provided under and subject to the Gemma Terms of Use found at [ai.google.dev/gemma/terms](https://ai.google.dev/gemma/terms)) - [Google Gemma 2 9B it](https://huggingface.co/google/gemma-2-9b-it) (Gemma is provided under and subject to the Gemma Terms of Use found at [ai.google.dev/gemma/terms](https://ai.google.dev/gemma/terms)) - [InternLM2.5 20B](https://huggingface.co/internlm/internlm2_5-20b-chat) (InternLM weights are fully open for academic research and also allow free commercial usage. A commercial license can be obtained as instructed in the model card.) - [InternLM2.5 7B](https://huggingface.co/internlm/internlm2_5-7b-chat) (InternLM weights are fully open for academic research and also allow free commercial usage. A commercial license can be obtained as instructed in the model card.) - [InternLM2.5 1.8B](https://huggingface.co/internlm/internlm2_5-1_8b-chat) (InternLM weights are fully open for academic research and also allow free commercial usage. A commercial license can be obtained as instructed in the model card.) - [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b-instruct) (Apache 2.0) - [Qwen2.5 72B Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) (Qwen is licensed under the Qwen LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved.) - [Qwen2.5 32B Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) (Apache 2.0) - [Qwen2.5 14B Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) (Apache 2.0) - [Qwen2.5 7B Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) (Apache 2.0) - [Llama 3.1 8B Instruct ](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) (this dataset was partially "Built with Llama" and is thus subject to the Llama 3.1 License) - [Llama 3.1 70B Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) (this dataset was partially "Built with Llama" and is thus subject to the Llama 3.1 License) - [Llama 3 8B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B) (this dataset was partially "Built with Meta Llama 3" and is thus subject to the Llama 3 License) - [GPT-4 Turbo](https://openai.com/index/new-models-and-developer-products-announced-at-devday/) and [GPT-4o](https://openai.com/index/hello-gpt-4o/) (Outputs produced by GPT-4 are subject to OpenAI's [terms of use](https://openai.com/policies/row-terms-of-use)) - [Claude 3.5 Sonnet](https://www.anthropic.com/news/claude-3-5-sonnet) (Outputs produced by Claude are subject to Anthropic [terms of service](https://www.anthropic.com/legal/commercial-terms) and [usage policy](https://www.anthropic.com/legal/aup)) ### Model Family | **Stage** | **Llama 3.1 8B** | **Llama 3.1 70B** | |----------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------| | **Base Model** | [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) | [meta-llama/Llama-3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B) | | **SFT** | [allenai/Llama-3.1-Tulu-3-8B-SFT](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-SFT) | [allenai/Llama-3.1-Tulu-3-70B-SFT](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B-SFT) | | **DPO** | [allenai/Llama-3.1-Tulu-3-8B-DPO](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-DPO) | [allenai/Llama-3.1-Tulu-3-70B-DPO](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B-DPO) | | **Final Models (RLVR)** | [allenai/Llama-3.1-Tulu-3-8B](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) | [allenai/Llama-3.1-Tulu-3-70B](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B) | | **Reward Model (RM)**| [allenai/Llama-3.1-Tulu-3-8B-RM](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-RM) | (Same as 8B) | ## License This dataset is licensed under ODC-BY. It is intended for research and educational use in accordance with Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use). This dataset includes output data generated from third party models that are subject to separate terms governing their use. ## Citation If Tülu3 or any of the related materials were helpful to your work, please cite: ``` @article{lambert2024tulu3, title = {Tülu 3: Pushing Frontiers in Open Language Model Post-Training}, author = { Nathan Lambert and Jacob Morrison and Valentina Pyatkin and Shengyi Huang and Hamish Ivison and Faeze Brahman and Lester James V. Miranda and Alisa Liu and Nouha Dziri and Shane Lyu and Yuling Gu and Saumya Malik and Victoria Graf and Jena D. Hwang and Jiangjiang Yang and Ronan Le Bras and Oyvind Tafjord and Chris Wilhelm and Luca Soldaini and Noah A. Smith and Yizhong Wang and Pradeep Dasigi and Hannaneh Hajishirzi }, year = {2024}, email = {tulu@allenai.org} } ```

![Tulu3 横幅](https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu-3/Tulu3-logo.png) # Llama 3.1 Tulu 3 405B 偏好混合数据集 *请注意，本数据集集合采用 ODC-BY-1.0 协议授权；数据子集适用不同的许可协议。本数据集部分内容为非商业用途。本混合数据集仅作为研究成果发布。* 本偏好混合数据集用于对我们的 [Llama 3.1 Tulu 3 405B 监督微调（Supervised Fine-Tuning，SFT）](https://huggingface.co/allenai/Llama-3.1-Tulu-3-405B-SFT) 检查点进行直接偏好优化（Direct Preference Optimization，DPO）训练，以得到 [Llama 3.1 Tulu 3 405B DPO](https://huggingface.co/allenai/Llama-3.1-Tulu-3-405B-DPO) 模型。本数据集包含360,924组生成对话对，其生成来源涵盖以下模型： - [Mistral 7B Instruct v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)（采用Apache 2.0协议） - [Mistral Nemo Instruct 2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407)（采用Apache 2.0协议） - [Tulu 2 7B](https://huggingface.co/allenai/tulu-2-7b)（采用Ai2 ImpACT低风险许可协议） - [Tulu 2 13B](https://huggingface.co/allenai/tulu-2-13b)（采用Ai2 ImpACT低风险许可协议） - [Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat)（采用Apache 2.0协议） - [Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat)（采用Apache 2.0协议） - [MPT 30B Chat](https://huggingface.co/mosaicml/mpt-30b-chat)（采用CC-BY-SA-4.0协议） - [MPT 7B 8k Chat](https://huggingface.co/mosaicml/mpt-7b-8k-chat)（采用CC-BY-SA-4.0协议） - [Google Gemma 2 27B it](https://huggingface.co/google/gemma-2-27b-it)（Gemma模型需遵循[ai.google.dev/gemma/terms](https://ai.google.dev/gemma/terms)中的Gemma使用条款） - [Google Gemma 2 9B it](https://huggingface.co/google/gemma-2-9b-it)（Gemma模型需遵循[ai.google.dev/gemma/terms](https://ai.google.dev/gemma/terms)中的Gemma使用条款） - [InternLM2.5 20B](https://huggingface.co/internlm/internlm2_5-20b-chat)（InternLM模型权重完全开放用于学术研究，同时支持免费商业使用；商业许可可参照模型卡片说明申请） - [InternLM2.5 7B](https://huggingface.co/internlm/internlm2_5-7b-chat)（InternLM模型权重完全开放用于学术研究，同时支持免费商业使用；商业许可可参照模型卡片说明申请） - [InternLM2.5 1.8B](https://huggingface.co/internlm/internlm2_5-1_8b-chat)（InternLM模型权重完全开放用于学术研究，同时支持免费商业使用；商业许可可参照模型卡片说明申请） - [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b-instruct)（采用Apache 2.0协议） - [Qwen2.5 72B Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct)（通义千问采用Qwen许可协议，版权归阿里巴巴云计算所有，保留所有权利） - [Qwen2.5 32B Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct)（采用Apache 2.0协议） - [Qwen2.5 14B Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct)（采用Apache 2.0协议） - [Qwen2.5 7B Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)（采用Apache 2.0协议） - [Llama 3.1 8B Instruct ](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)（本数据集部分内容「基于Llama开发」，因此需遵循Llama 3.1许可协议） - [Llama 3.1 70B Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct)（本数据集部分内容「基于Llama开发」，因此需遵循Llama 3.1许可协议） - [Llama 3 8B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B)（本数据集部分内容「基于Meta Llama 3开发」，因此需遵循Llama 3许可协议） - [GPT-4 Turbo](https://openai.com/index/new-models-and-developer-products-announced-at-devday/) 和 [GPT-4o](https://openai.com/index/hello-gpt-4o/)（GPT-4生成的输出需遵循OpenAI的[使用条款](https://openai.com/policies/row-terms-of-use)） - [Claude 3.5 Sonnet](https://www.anthropic.com/news/claude-3-5-sonnet)（Claude生成的输出需遵循Anthropic的[服务条款](https://www.anthropic.com/legal/commercial-terms)与[使用政策](https://www.anthropic.com/legal/aup)） ### 模型家族 | **阶段** | **Llama 3.1 8B** | **Llama 3.1 70B** | |----------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------| | **基础模型** | [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) | [meta-llama/Llama-3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B) | | **监督微调（SFT）** | [allenai/Llama-3.1-Tulu-3-8B-SFT](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-SFT) | [allenai/Llama-3.1-Tulu-3-70B-SFT](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B-SFT) | | **直接偏好优化（DPO）** | [allenai/Llama-3.1-Tulu-3-8B-DPO](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-DPO) | [allenai/Llama-3.1-Tulu-3-70B-DPO](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B-DPO) | | **最终模型（RLVR）** | [allenai/Llama-3.1-Tulu-3-8B](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) | [allenai/Llama-3.1-Tulu-3-70B](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B) | | **奖励模型（RM）**| [allenai/Llama-3.1-Tulu-3-8B-RM](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-RM) | 与8B版本一致 | ## 许可协议本数据集采用ODC-BY协议授权。依据艾伦人工智能研究所（Allen Institute for AI，Ai2）的[负责任使用指南](https://allenai.org/responsible-use)，本数据集仅用于研究与教育用途。本数据集包含由第三方模型生成的输出数据，此类数据受其自身专属使用条款约束。 ## 引用格式若Tülu3或其相关材料对您的研究工作有所帮助，请引用以下文献： @article{lambert2024tulu3, title = {Tülu 3: Pushing Frontiers in Open Language Model Post-Training}, author = { Nathan Lambert and Jacob Morrison and Valentina Pyatkin and Shengyi Huang and Hamish Ivison and Faeze Brahman and Lester James V. Miranda and Alisa Liu and Nouha Dziri and Shane Lyu and Yuling Gu and Saumya Malik and Victoria Graf and Jena D. Hwang and Jiangjiang Yang and Ronan Le Bras and Oyvind Tafjord and Chris Wilhelm and Luca Soldaini and Noah A. Smith and Yizhong Wang and Pradeep Dasigi and Hannaneh Hajishirzi }, year = {2024}, email = {tulu@allenai.org} }

提供机构：

maas

创建时间：

2025-05-28

搜集汇总

数据集介绍