llama-3.1-tulu-3-8b-preference-mixture
收藏魔搭社区2026-04-28 更新2024-11-30 收录
下载链接:
https://modelscope.cn/datasets/allenai/llama-3.1-tulu-3-8b-preference-mixture
下载链接
链接失效反馈官方服务:
资源简介:
<img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu-3/Tulu3-logo.png" alt="Tulu3 banner" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
# Tulu 3 8B Preference Mixture
*Note that this collection is licensed under ODC-BY-1.0 license; different licenses apply to subsets of the data. Some portions of the dataset are non-commercial. We present the mixture as a research artifact.*
This mix is made up from the following preference datasets:
- https://huggingface.co/datasets/allenai/tulu-3-sft-reused-off-policy
- https://huggingface.co/datasets/allenai/tulu-3-sft-reused-on-policy-8b
- https://huggingface.co/datasets/allenai/tulu-3-wildchat-if-on-policy-8b
- https://huggingface.co/datasets/allenai/tulu-3-IF-augmented-on-policy-8b
- https://huggingface.co/datasets/allenai/tulu-3-sft-personas-instruction-following
- https://huggingface.co/datasets/allenai/tulu-3-wildchat-reused-on-policy-8b
- https://huggingface.co/datasets/allenai/tulu-3-ultrafeedback-cleaned-on-policy-8b
This preference mixture used for DPO on our the [Llama 3.1 Tulu 3 8B SFT](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-SFT) checkpoint to obtain [Llama 3.1 Tulu 3 8B DPO](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-DPO).
It contains 272,898 generation pairs obtained using the following models:
- [Mistral 7B Instruct v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) (Apache 2.0)
- [Mistral Nemo Instruct 2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) (Apache 2.0)
- [Tulu 2 7B](https://huggingface.co/allenai/tulu-2-7b) (Ai2 ImpACT Low Risk License)
- [Tulu 2 13B](https://huggingface.co/allenai/tulu-2-13b) (Ai2 ImpACT Low Risk License)
- [Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat) (Apache 2.0)
- [Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat) (Apache 2.0)
- [MPT 30B Chat](https://huggingface.co/mosaicml/mpt-30b-chat) (CC-BY-SA-4.0)
- [MPT 7B 8k Chat](https://huggingface.co/mosaicml/mpt-7b-8k-chat) (CC-BY-SA-4.0)
- [Google Gemma 2 27B it](https://huggingface.co/google/gemma-2-27b-it) (Gemma is provided under and subject to the Gemma Terms of Use found at [ai.google.dev/gemma/terms](https://ai.google.dev/gemma/terms))
- [Google Gemma 2 9B it](https://huggingface.co/google/gemma-2-9b-it) (Gemma is provided under and subject to the Gemma Terms of Use found at [ai.google.dev/gemma/terms](https://ai.google.dev/gemma/terms))
- [InternLM2.5 20B](https://huggingface.co/internlm/internlm2_5-20b-chat) (InternLM weights are fully open for academic research and also allow free commercial usage. A commercial license can be obtained as instructed in the model card.)
- [InternLM2.5 7B](https://huggingface.co/internlm/internlm2_5-7b-chat) (InternLM weights are fully open for academic research and also allow free commercial usage. A commercial license can be obtained as instructed in the model card.)
- [InternLM2.5 1.8B](https://huggingface.co/internlm/internlm2_5-1_8b-chat) (InternLM weights are fully open for academic research and also allow free commercial usage. A commercial license can be obtained as instructed in the model card.)
- [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b-instruct) (Apache 2.0)
- [Qwen2.5 72B Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) (Qwen is licensed under the Qwen LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved.)
- [Qwen2.5 32B Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) (Apache 2.0)
- [Qwen2.5 14B Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) (Apache 2.0)
- [Qwen2.5 7B Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) (Apache 2.0)
- [Llama 3.1 8B Instruct ](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) (this dataset was partially "Built with Llama" and is thus subject to the Llama 3.1 License)
- [Llama 3.1 70B Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) (this dataset was partially "Built with Llama" and is thus subject to the Llama 3.1 License)
- [Llama 3 8B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B) (this dataset was partially "Built with Meta Llama 3" and is thus subject to the Llama 3 License)
- [GPT-4 Turbo](https://openai.com/index/new-models-and-developer-products-announced-at-devday/) and [GPT-4o](https://openai.com/index/hello-gpt-4o/) (Outputs produced by GPT-4 are subject to OpenAI's [terms of use](https://openai.com/policies/row-terms-of-use))
- [Claude 3.5 Sonnet](https://www.anthropic.com/news/claude-3-5-sonnet) (Outputs produced by Claude are subject to Anthropic [terms of service](https://www.anthropic.com/legal/commercial-terms) and [usage policy](https://www.anthropic.com/legal/aup))
### Model Family
| **Stage** | **Llama 3.1 8B** | **Llama 3.1 70B** |
|----------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|
| **Base Model** | [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) | [meta-llama/Llama-3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B) |
| **SFT** | [allenai/Llama-3.1-Tulu-3-8B-SFT](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-SFT) | [allenai/Llama-3.1-Tulu-3-70B-SFT](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B-SFT) |
| **DPO** | [allenai/Llama-3.1-Tulu-3-8B-DPO](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-DPO) | [allenai/Llama-3.1-Tulu-3-70B-DPO](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B-DPO) |
| **Final Models (RLVR)** | [allenai/Llama-3.1-Tulu-3-8B](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) | [allenai/Llama-3.1-Tulu-3-70B](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B) |
| **Reward Model (RM)**| [allenai/Llama-3.1-Tulu-3-8B-RM](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-RM) | (Same as 8B) |
## License
This dataset is licensed under ODC-BY. It is intended for research and educational use in accordance with Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use). This dataset includes output data generated from third party models that are subject to separate terms governing their use.
## Citation
If Tülu3 or any of the related materials were helpful to your work, please cite:
```
@article{lambert2024tulu3,
title = {Tülu 3: Pushing Frontiers in Open Language Model Post-Training},
author = {
Nathan Lambert and
Jacob Morrison and
Valentina Pyatkin and
Shengyi Huang and
Hamish Ivison and
Faeze Brahman and
Lester James V. Miranda and
Alisa Liu and
Nouha Dziri and
Shane Lyu and
Yuling Gu and
Saumya Malik and
Victoria Graf and
Jena D. Hwang and
Jiangjiang Yang and
Ronan Le Bras and
Oyvind Tafjord and
Chris Wilhelm and
Luca Soldaini and
Noah A. Smith and
Yizhong Wang and
Pradeep Dasigi and
Hannaneh Hajishirzi
},
year = {2024},
email = {tulu@allenai.org}
}
```
<img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu-3/Tulu3-logo.png" alt="Tulu3 banner" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
# Tulu 3 8B 偏好混合数据集
*请注意,本数据集集合采用ODC-BY-1.0许可协议进行授权;数据集的子集可能适用不同的许可协议。本数据集包含部分非商业用途的内容,我们将该混合数据集作为研究成果发布。*
该混合数据集由以下偏好数据集组成:
- https://huggingface.co/datasets/allenai/tulu-3-sft-reused-off-policy
- https://huggingface.co/datasets/allenai/tulu-3-sft-reused-on-policy-8b
- https://huggingface.co/datasets/allenai/tulu-3-wildchat-if-on-policy-8b
- https://huggingface.co/datasets/allenai/tulu-3-IF-augmented-on-policy-8b
- https://huggingface.co/datasets/allenai/tulu-3-sft-personas-instruction-following
- https://huggingface.co/datasets/allenai/tulu-3-wildchat-reused-on-policy-8b
- https://huggingface.co/datasets/allenai/tulu-3-ultrafeedback-cleaned-on-policy-8b
本偏好混合数据集被用于基于我们的[Llama 3.1 Tulu 3 8B 监督微调(Supervised Fine-Tuning,SFT)](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-SFT) 检查点的直接偏好优化(Direct Preference Optimization,DPO)训练,以获得[Llama 3.1 Tulu 3 8B DPO](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-DPO) 模型。
本数据集包含272,898个生成对话对,这些对话对由以下模型生成:
- [Mistral 7B Instruct v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)(采用Apache 2.0许可协议)
- [Mistral Nemo Instruct 2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407)(采用Apache 2.0许可协议)
- [Tulu 2 7B](https://huggingface.co/allenai/tulu-2-7b)(采用Ai2 ImpACT 低风险许可协议)
- [Tulu 2 13B](https://huggingface.co/allenai/tulu-2-13b)(采用Ai2 ImpACT 低风险许可协议)
- [Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat)(采用Apache 2.0许可协议)
- [Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat)(采用Apache 2.0许可协议)
- [MPT 30B Chat](https://huggingface.co/mosaicml/mpt-30b-chat)(采用CC-BY-SA-4.0许可协议,即知识共享署名-相同方式共享4.0国际许可协议)
- [MPT 7B 8k Chat](https://huggingface.co/mosaicml/mpt-7b-8k-chat)(采用CC-BY-SA-4.0许可协议,即知识共享署名-相同方式共享4.0国际许可协议)
- [Google Gemma 2 27B it](https://huggingface.co/google/gemma-2-27b-it)(Gemma模型需遵守[ai.google.dev/gemma/terms](https://ai.google.dev/gemma/terms) 中公布的Gemma使用条款)
- [Google Gemma 2 9B it](https://huggingface.co/google/gemma-2-9b-it)(Gemma模型需遵守[ai.google.dev/gemma/terms](https://ai.google.dev/gemma/terms) 中公布的Gemma使用条款)
- [InternLM2.5 20B](https://huggingface.co/internlm/internlm2_5-20b-chat)(InternLM模型权重完全开放用于学术研究,同时支持免费商业使用;如需获取商业许可,请参照模型卡片中的说明操作。)
- [InternLM2.5 7B](https://huggingface.co/internlm/internlm2_5-7b-chat)(InternLM模型权重完全开放用于学术研究,同时支持免费商业使用;如需获取商业许可,请参照模型卡片中的说明操作。)
- [InternLM2.5 1.8B](https://huggingface.co/internlm/internlm2_5-1_8b-chat)(InternLM模型权重完全开放用于学术研究,同时支持免费商业使用;如需获取商业许可,请参照模型卡片中的说明操作。)
- [Falcon 7B](https://huggingface.co/tiiuae/falcon-7b-instruct)(采用Apache 2.0许可协议)
- [Qwen2.5 72B Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct)(Qwen模型采用Qwen许可协议授权,版权所有 © 阿里巴巴云计算有限公司,保留所有权利。)
- [Qwen2.5 32B Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct)(采用Apache 2.0许可协议)
- [Qwen2.5 14B Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct)(采用Apache 2.0许可协议)
- [Qwen2.5 7B Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)(采用Apache 2.0许可协议)
- [Llama 3.1 8B Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)(本数据集部分内容"基于Llama模型构建",因此需遵守Llama 3.1许可协议)
- [Llama 3.1 70B Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct)(本数据集部分内容"基于Llama模型构建",因此需遵守Llama 3.1许可协议)
- [Llama 3 8B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B)(本数据集部分内容"基于Meta Llama 3模型构建",因此需遵守Llama 3许可协议)
- [GPT-4 Turbo](https://openai.com/index/new-models-and-developer-products-announced-at-devday/) 与 [GPT-4o](https://openai.com/index/hello-gpt-4o/)(GPT-4生成的输出需遵守OpenAI的[使用条款](https://openai.com/policies/row-terms-of-use))
- [Claude 3.5 Sonnet](https://www.anthropic.com/news/claude-3-5-sonnet)(Claude生成的输出需遵守Anthropic的[服务条款](https://www.anthropic.com/legal/commercial-terms)与[使用政策](https://www.anthropic.com/legal/aup))
### 模型家族
| **阶段** | **Llama 3.1 8B** | **Llama 3.1 70B** |
|----------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|
| **基础模型** | [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) | [meta-llama/Llama-3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B) |
| **监督微调(SFT)** | [allenai/Llama-3.1-Tulu-3-8B-SFT](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-SFT) | [allenai/Llama-3.1-Tulu-3-70B-SFT](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B-SFT) |
| **直接偏好优化(DPO)** | [allenai/Llama-3.1-Tulu-3-8B-DPO](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-DPO) | [allenai/Llama-3.1-Tulu-3-70B-DPO](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B-DPO) |
| **最终模型(RLVR)** | [allenai/Llama-3.1-Tulu-3-8B](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) | [allenai/Llama-3.1-Tulu-3-70B](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B) |
| **奖励模型(Reward Model,RM)**| [allenai/Llama-3.1-Tulu-3-8B-RM](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-RM) | 与8B版本一致 |
## 许可协议
本数据集采用ODC-BY许可协议授权。根据艾伦人工智能研究所(Allen Institute for AI, Ai2)的[负责任使用指南](https://allenai.org/responsible-use),本数据集仅用于研究与教育用途。本数据集包含由第三方模型生成的输出数据,这些数据需遵守对应模型的单独使用条款。
## 引用
如果Tülu3或其相关材料对你的研究工作有所帮助,请引用以下文献:
@article{lambert2024tulu3,
title = {Tülu 3: Pushing Frontiers in Open Language Model Post-Training},
author = {
Nathan Lambert and
Jacob Morrison and
Valentina Pyatkin and
Shengyi Huang and
Hamish Ivison and
Faeze Brahman and
Lester James V. Miranda and
Alisa Liu and
Nouha Dziri and
Shane Lyu and
Yuling Gu and
Saumya Malik and
Victoria Graf and
Jena D. Hwang and
Jiangjiang Yang and
Ronan Le Bras and
Oyvind Tafjord and
Chris Wilhelm and
Luca Soldaini and
Noah A. Smith and
Yizhong Wang and
Pradeep Dasigi and
Hannaneh Hajishirzi
},
year = {2024},
email = {tulu@allenai.org}
}
提供机构:
maas
创建时间:
2025-05-28
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集是一个用于对Llama 3.1 Tulu 3 8B模型进行DPO训练的偏好混合数据集,由多个子数据集组合而成,包含272,898个生成对。数据基于多种模型生成,整体遵循ODC-BY许可证,但部分内容有非商业使用限制。
以上内容由遇见数据集搜集并总结生成



