tulu-3-sft-mixture
收藏魔搭社区2026-05-15 更新2024-11-30 收录
下载链接:
https://modelscope.cn/datasets/allenai/tulu-3-sft-mixture
下载链接
链接失效反馈官方服务:
资源简介:
<img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu-3/Tulu3-logo.png" alt="Tulu3 banner" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
# Tulu 3 SFT Mixture
*Note that this collection is licensed under ODC-BY-1.0 license; different licenses apply to subsets of the data. Some portions of the dataset are non-commercial. We present the mixture as a research artifact.*
The Tulu 3 SFT mixture was used to train the [Tulu 3 series of models](https://huggingface.co/collections/allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5).
It contains 939,344 samples from the following sets:
- [CoCoNot](https://huggingface.co/datasets/allenai/coconot) (ODC-BY-1.0), 10,983 prompts (Brahman et al., 2024)
- [FLAN v2](https://github.com/google-research/FLAN/tree/main) via [`ai2-adapt-dev/flan_v2_converted`](https://huggingface.co/datasets/ai2-adapt-dev/flan_v2_converted), 89,982 prompts (Longpre et al., 2023)
- [No Robots](https://huggingface.co/datasets/HuggingFaceH4/no_robots) (CC-BY-NC-4.0), 9,500 prompts (Rajani et al. 2023)
- [OpenAssistant Guanaco](https://huggingface.co/datasets/OpenAssistant/oasst1) (Apache 2.0), 7,132 prompts (Kopf et al., 2024)
- [Tulu 3 Persona MATH](https://huggingface.co/datasets/allenai/tulu-3-personas-math) (ODC-BY-1.0), 149,960 prompts
- [Tulu 3 Persona GSM](https://huggingface.co/datasets/allenai/tulu-3-sft-personas-math-grade) (ODC-BY-1.0), 49,980 prompts
- [Tulu 3 Persona Python](https://huggingface.co/datasets/allenai/tulu-3-sft-personas-code) (ODC-BY-1.0), 34,999 prompts
- [Tulu 3 Persona Algebra](https://huggingface.co/datasets/allenai/tulu-3-personas-algebra) (ODC-BY-1.0), 20,000 prompts
- [Tulu 3 Persona IF](https://huggingface.co/datasets/allenai/tulu-3-sft-personas-instruction-following) (ODC-BY-1.0), 29,980 prompts
- [NuminaMath-TIR](https://huggingface.co/datasets/AI-MO/NuminaMath-TIR) (Apache 2.0), 64,312 prompts (Beeching et al. 2024)
- [Tulu 3 WildGuardMix](https://huggingface.co/datasets/allenai/wildguardmix) (Apache 2.0), 50,000 prompts (Han et al., 2024)
- [Tulu 3 WildJailbreak](https://huggingface.co/datasets/allenai/wildjailbreak) (ODC-BY-1.0), 50,000 prompts (Wildteaming, 2024)
- [Tulu 3 Hardcoded](https://huggingface.co/datasets/allenai/tulu-3-hard-coded) (CC-BY-4.0), 240 prompts
- [Aya](https://huggingface.co/datasets/CohereForAI/aya_dataset) (Apache 2.0), 100,000 prompts (Singh et al., 2024)
- [WildChat GPT-4](https://huggingface.co/datasets/allenai/WildChat-1M) (ODC-BY-1.0), 100,000 prompts (Zhao et al., 2024)
- [TableGPT](https://huggingface.co/datasets/LipengCS/Table-GPT) (MIT), 5,000 prompts (Zha et al., 2023)
- [SciRIFF](https://huggingface.co/datasets/allenai/SciRIFF) (ODC-BY-1.0), 10,000 prompts (Wadden et al., 2024)
- [Evol CodeAlpaca](https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1) (Apache 2.0), 107,276 prompts (Luo et al., 2023)
## Dataset Structure
Each example in the dataset contains the standard instruction-tuning data points as follow:
- `id` (str): a unique identifier
- `messages` (list): message format used for supervised fine-tuning (this contains user prompt and assistant responses)
- `source` (str): the source dataset for the given sample
### Model Family
| **Stage** | **Llama 3.1 8B** | **Llama 3.1 70B** |
|----------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|
| **Base Model** | [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) | [meta-llama/Llama-3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B) |
| **SFT** | [allenai/Llama-3.1-Tulu-3-8B-SFT](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-SFT) | [allenai/Llama-3.1-Tulu-3-70B-SFT](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B-SFT) |
| **DPO** | [allenai/Llama-3.1-Tulu-3-8B-DPO](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-DPO) | [allenai/Llama-3.1-Tulu-3-70B-DPO](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B-DPO) |
| **Final Models (RLVR)** | [allenai/Llama-3.1-Tulu-3-8B](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) | [allenai/Llama-3.1-Tulu-3-70B](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B) |
| **Reward Model (RM)**| [allenai/Llama-3.1-Tulu-3-8B-RM](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-RM) | (Same as 8B) |
## License
This dataset is licensed under ODC-BY-1.0. It is intended for research and educational use in accordance with Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use). This dataset includes output data generated from third party models that are subject to separate terms governing their use. For more information on license and terms, consult each subset linked above.
## Citation
If Tülu3 or any of the related materials were helpful to your work, please cite:
```
@article{lambert2024tulu3,
title = {Tülu 3: Pushing Frontiers in Open Language Model Post-Training},
author = {
Nathan Lambert and
Jacob Morrison and
Valentina Pyatkin and
Shengyi Huang and
Hamish Ivison and
Faeze Brahman and
Lester James V. Miranda and
Alisa Liu and
Nouha Dziri and
Shane Lyu and
Yuling Gu and
Saumya Malik and
Victoria Graf and
Jena D. Hwang and
Jiangjiang Yang and
Ronan Le Bras and
Oyvind Tafjord and
Chris Wilhelm and
Luca Soldaini and
Noah A. Smith and
Yizhong Wang and
Pradeep Dasigi and
Hannaneh Hajishirzi
},
year = {2024},
email = {tulu@allenai.org}
}
```
<img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu-3/Tulu3-logo.png" alt="Tulu3 banner" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
# Tulu 3 监督微调混合数据集(Supervised Fine-Tuning Mixture, SFT Mixture)
*请注意,本数据集合集采用ODC-BY-1.0协议授权;其子集可能适用不同的授权协议。本数据集的部分内容仅供非商业用途。本混合数据集仅作为研究成果发布。*
Tulu 3 监督微调(Supervised Fine-Tuning, SFT)混合数据集被用于训练[Tulu 3系列模型](https://huggingface.co/collections/allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5)。该数据集包含来自以下子集的939,344条样本:
- [CoCoNot](https://huggingface.co/datasets/allenai/coconot)(采用ODC-BY-1.0协议):10,983条提示词(Brahman等人,2024)
- [FLAN v2](https://github.com/google-research/FLAN/tree/main) 通过 [`ai2-adapt-dev/flan_v2_converted`](https://huggingface.co/datasets/ai2-adapt-dev/flan_v2_converted) 加载,共89,982条提示词(Longpre等人,2023)
- [No Robots](https://huggingface.co/datasets/HuggingFaceH4/no_robots)(采用CC-BY-NC-4.0协议):9,500条提示词(Rajani等人,2023)
- [OpenAssistant Guanaco](https://huggingface.co/datasets/OpenAssistant/oasst1)(采用Apache 2.0协议):7,132条提示词(Kopf等人,2024)
- [Tulu 3 Persona MATH](https://huggingface.co/datasets/allenai/tulu-3-personas-math)(采用ODC-BY-1.0协议):149,960条提示词
- [Tulu 3 Persona GSM](https://huggingface.co/datasets/allenai/tulu-3-sft-personas-math-grade)(采用ODC-BY-1.0协议):49,980条提示词
- [Tulu 3 Persona Python](https://huggingface.co/datasets/allenai/tulu-3-sft-personas-code)(采用ODC-BY-1.0协议):34,999条提示词
- [Tulu 3 Persona Algebra](https://huggingface.co/datasets/allenai/tulu-3-personas-algebra)(采用ODC-BY-1.0协议):20,000条提示词
- [Tulu 3 Persona IF](https://huggingface.co/datasets/allenai/tulu-3-sft-personas-instruction-following)(采用ODC-BY-1.0协议):29,980条提示词
- [NuminaMath-TIR](https://huggingface.co/datasets/AI-MO/NuminaMath-TIR)(采用Apache 2.0协议):64,312条提示词(Beeching等人,2024)
- [Tulu 3 WildGuardMix](https://huggingface.co/datasets/allenai/wildguardmix)(采用Apache 2.0协议):50,000条提示词(Han等人,2024)
- [Tulu 3 WildJailbreak](https://huggingface.co/datasets/allenai/wildjailbreak)(采用ODC-BY-1.0协议):50,000条提示词(Wildteaming,2024)
- [Tulu 3 Hardcoded](https://huggingface.co/datasets/allenai/tulu-3-hard-coded)(采用CC-BY-4.0协议):240条提示词
- [Aya](https://huggingface.co/datasets/CohereForAI/aya_dataset)(采用Apache 2.0协议):100,000条提示词(Singh等人,2024)
- [WildChat GPT-4](https://huggingface.co/datasets/allenai/WildChat-1M)(采用ODC-BY-1.0协议):100,000条提示词(Zhao等人,2024)
- [TableGPT](https://huggingface.co/datasets/LipengCS/Table-GPT)(采用MIT协议):5,000条提示词(Zha等人,2023)
- [SciRIFF](https://huggingface.co/datasets/allenai/SciRIFF)(采用ODC-BY-1.0协议):10,000条提示词(Wadden等人,2024)
- [Evol CodeAlpaca](https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1)(采用Apache 2.0协议):107,276条提示词(Luo等人,2023)
## 数据集结构
本数据集的每条样本均包含标准的指令微调(Instruction Tuning)数据格式,具体如下:
- `id`(字符串类型):唯一标识符
- `messages`(列表类型):用于监督微调的消息格式(包含用户提示词与助手回复)
- `source`(字符串类型):当前样本所属的源数据集
## 模型家族
| **阶段** | **Llama 3.1 8B** | **Llama 3.1 70B** |
|----------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|
| **基础模型** | [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) | [meta-llama/Llama-3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B) |
| **监督微调(SFT)** | [allenai/Llama-3.1-Tulu-3-8B-SFT](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-SFT) | [allenai/Llama-3.1-Tulu-3-70B-SFT](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B-SFT) |
| **偏好对齐(DPO,Direct Preference Optimization)** | [allenai/Llama-3.1-Tulu-3-8B-DPO](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-DPO) | [allenai/Llama-3.1-Tulu-3-70B-DPO](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B-DPO) |
| **最终模型(RLVR)** | [allenai/Llama-3.1-Tulu-3-8B](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) | [allenai/Llama-3.1-Tulu-3-70B](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B) |
| **奖励模型(Reward Model, RM)**| [allenai/Llama-3.1-Tulu-3-8B-RM](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-RM) | (与8B版本一致) |
## 授权协议
本数据集采用ODC-BY-1.0协议授权,仅可用于研究与教育用途,并需遵循艾伦人工智能研究所(Allen Institute for AI, Ai2)的[负责任使用指南](https://allenai.org/responsible-use)。本数据集包含由第三方模型生成的输出数据,此类数据受其自身独立使用条款约束。如需了解详细授权与使用条款,请查阅上文链接的各子集页面。
## 引用
若Tülu3或其相关材料对你的研究有所帮助,请引用如下文献:
@article{lambert2024tulu3,
title = {Tülu 3: Pushing Frontiers in Open Language Model Post-Training},
author = {
Nathan Lambert and
Jacob Morrison and
Valentina Pyatkin and
Shengyi Huang and
Hamish Ivison and
Faeze Brahman and
Lester James V. Miranda and
Alisa Liu and
Nouha Dziri and
Shane Lyu and
Yuling Gu and
Saumya Malik and
Victoria Graf and
Jena D. Hwang and
Jiangjiang Yang and
Ronan Le Bras and
Oyvind Tafjord and
Chris Wilhelm and
Luca Soldaini and
Noah A. Smith and
Yizhong Wang and
Pradeep Dasigi and
Hannaneh Hajishirzi
},
year = {2024},
email = {tulu@allenai.org}
}
提供机构:
maas
创建时间:
2025-05-28



