five

tulu-3-sft-mixture

收藏
魔搭社区2026-05-15 更新2024-11-30 收录
下载链接:
https://modelscope.cn/datasets/allenai/tulu-3-sft-mixture
下载链接
链接失效反馈
官方服务:
资源简介:
<img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu-3/Tulu3-logo.png" alt="Tulu3 banner" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/> # Tulu 3 SFT Mixture *Note that this collection is licensed under ODC-BY-1.0 license; different licenses apply to subsets of the data. Some portions of the dataset are non-commercial. We present the mixture as a research artifact.* The Tulu 3 SFT mixture was used to train the [Tulu 3 series of models](https://huggingface.co/collections/allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5). It contains 939,344 samples from the following sets: - [CoCoNot](https://huggingface.co/datasets/allenai/coconot) (ODC-BY-1.0), 10,983 prompts (Brahman et al., 2024) - [FLAN v2](https://github.com/google-research/FLAN/tree/main) via [`ai2-adapt-dev/flan_v2_converted`](https://huggingface.co/datasets/ai2-adapt-dev/flan_v2_converted), 89,982 prompts (Longpre et al., 2023) - [No Robots](https://huggingface.co/datasets/HuggingFaceH4/no_robots) (CC-BY-NC-4.0), 9,500 prompts (Rajani et al. 2023) - [OpenAssistant Guanaco](https://huggingface.co/datasets/OpenAssistant/oasst1) (Apache 2.0), 7,132 prompts (Kopf et al., 2024) - [Tulu 3 Persona MATH](https://huggingface.co/datasets/allenai/tulu-3-personas-math) (ODC-BY-1.0), 149,960 prompts - [Tulu 3 Persona GSM](https://huggingface.co/datasets/allenai/tulu-3-sft-personas-math-grade) (ODC-BY-1.0), 49,980 prompts - [Tulu 3 Persona Python](https://huggingface.co/datasets/allenai/tulu-3-sft-personas-code) (ODC-BY-1.0), 34,999 prompts - [Tulu 3 Persona Algebra](https://huggingface.co/datasets/allenai/tulu-3-personas-algebra) (ODC-BY-1.0), 20,000 prompts - [Tulu 3 Persona IF](https://huggingface.co/datasets/allenai/tulu-3-sft-personas-instruction-following) (ODC-BY-1.0), 29,980 prompts - [NuminaMath-TIR](https://huggingface.co/datasets/AI-MO/NuminaMath-TIR) (Apache 2.0), 64,312 prompts (Beeching et al. 2024) - [Tulu 3 WildGuardMix](https://huggingface.co/datasets/allenai/wildguardmix) (Apache 2.0), 50,000 prompts (Han et al., 2024) - [Tulu 3 WildJailbreak](https://huggingface.co/datasets/allenai/wildjailbreak) (ODC-BY-1.0), 50,000 prompts (Wildteaming, 2024) - [Tulu 3 Hardcoded](https://huggingface.co/datasets/allenai/tulu-3-hard-coded) (CC-BY-4.0), 240 prompts - [Aya](https://huggingface.co/datasets/CohereForAI/aya_dataset) (Apache 2.0), 100,000 prompts (Singh et al., 2024) - [WildChat GPT-4](https://huggingface.co/datasets/allenai/WildChat-1M) (ODC-BY-1.0), 100,000 prompts (Zhao et al., 2024) - [TableGPT](https://huggingface.co/datasets/LipengCS/Table-GPT) (MIT), 5,000 prompts (Zha et al., 2023) - [SciRIFF](https://huggingface.co/datasets/allenai/SciRIFF) (ODC-BY-1.0), 10,000 prompts (Wadden et al., 2024) - [Evol CodeAlpaca](https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1) (Apache 2.0), 107,276 prompts (Luo et al., 2023) ## Dataset Structure Each example in the dataset contains the standard instruction-tuning data points as follow: - `id` (str): a unique identifier - `messages` (list): message format used for supervised fine-tuning (this contains user prompt and assistant responses) - `source` (str): the source dataset for the given sample ### Model Family | **Stage** | **Llama 3.1 8B** | **Llama 3.1 70B** | |----------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------| | **Base Model** | [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) | [meta-llama/Llama-3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B) | | **SFT** | [allenai/Llama-3.1-Tulu-3-8B-SFT](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-SFT) | [allenai/Llama-3.1-Tulu-3-70B-SFT](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B-SFT) | | **DPO** | [allenai/Llama-3.1-Tulu-3-8B-DPO](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-DPO) | [allenai/Llama-3.1-Tulu-3-70B-DPO](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B-DPO) | | **Final Models (RLVR)** | [allenai/Llama-3.1-Tulu-3-8B](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) | [allenai/Llama-3.1-Tulu-3-70B](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B) | | **Reward Model (RM)**| [allenai/Llama-3.1-Tulu-3-8B-RM](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-RM) | (Same as 8B) | ## License This dataset is licensed under ODC-BY-1.0. It is intended for research and educational use in accordance with Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use). This dataset includes output data generated from third party models that are subject to separate terms governing their use. For more information on license and terms, consult each subset linked above. ## Citation If Tülu3 or any of the related materials were helpful to your work, please cite: ``` @article{lambert2024tulu3, title = {Tülu 3: Pushing Frontiers in Open Language Model Post-Training}, author = { Nathan Lambert and Jacob Morrison and Valentina Pyatkin and Shengyi Huang and Hamish Ivison and Faeze Brahman and Lester James V. Miranda and Alisa Liu and Nouha Dziri and Shane Lyu and Yuling Gu and Saumya Malik and Victoria Graf and Jena D. Hwang and Jiangjiang Yang and Ronan Le Bras and Oyvind Tafjord and Chris Wilhelm and Luca Soldaini and Noah A. Smith and Yizhong Wang and Pradeep Dasigi and Hannaneh Hajishirzi }, year = {2024}, email = {tulu@allenai.org} } ```

<img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu-3/Tulu3-logo.png" alt="Tulu3 banner" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/> # Tulu 3 监督微调混合数据集(Supervised Fine-Tuning Mixture, SFT Mixture) *请注意,本数据集合集采用ODC-BY-1.0协议授权;其子集可能适用不同的授权协议。本数据集的部分内容仅供非商业用途。本混合数据集仅作为研究成果发布。* Tulu 3 监督微调(Supervised Fine-Tuning, SFT)混合数据集被用于训练[Tulu 3系列模型](https://huggingface.co/collections/allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5)。该数据集包含来自以下子集的939,344条样本: - [CoCoNot](https://huggingface.co/datasets/allenai/coconot)(采用ODC-BY-1.0协议):10,983条提示词(Brahman等人,2024) - [FLAN v2](https://github.com/google-research/FLAN/tree/main) 通过 [`ai2-adapt-dev/flan_v2_converted`](https://huggingface.co/datasets/ai2-adapt-dev/flan_v2_converted) 加载,共89,982条提示词(Longpre等人,2023) - [No Robots](https://huggingface.co/datasets/HuggingFaceH4/no_robots)(采用CC-BY-NC-4.0协议):9,500条提示词(Rajani等人,2023) - [OpenAssistant Guanaco](https://huggingface.co/datasets/OpenAssistant/oasst1)(采用Apache 2.0协议):7,132条提示词(Kopf等人,2024) - [Tulu 3 Persona MATH](https://huggingface.co/datasets/allenai/tulu-3-personas-math)(采用ODC-BY-1.0协议):149,960条提示词 - [Tulu 3 Persona GSM](https://huggingface.co/datasets/allenai/tulu-3-sft-personas-math-grade)(采用ODC-BY-1.0协议):49,980条提示词 - [Tulu 3 Persona Python](https://huggingface.co/datasets/allenai/tulu-3-sft-personas-code)(采用ODC-BY-1.0协议):34,999条提示词 - [Tulu 3 Persona Algebra](https://huggingface.co/datasets/allenai/tulu-3-personas-algebra)(采用ODC-BY-1.0协议):20,000条提示词 - [Tulu 3 Persona IF](https://huggingface.co/datasets/allenai/tulu-3-sft-personas-instruction-following)(采用ODC-BY-1.0协议):29,980条提示词 - [NuminaMath-TIR](https://huggingface.co/datasets/AI-MO/NuminaMath-TIR)(采用Apache 2.0协议):64,312条提示词(Beeching等人,2024) - [Tulu 3 WildGuardMix](https://huggingface.co/datasets/allenai/wildguardmix)(采用Apache 2.0协议):50,000条提示词(Han等人,2024) - [Tulu 3 WildJailbreak](https://huggingface.co/datasets/allenai/wildjailbreak)(采用ODC-BY-1.0协议):50,000条提示词(Wildteaming,2024) - [Tulu 3 Hardcoded](https://huggingface.co/datasets/allenai/tulu-3-hard-coded)(采用CC-BY-4.0协议):240条提示词 - [Aya](https://huggingface.co/datasets/CohereForAI/aya_dataset)(采用Apache 2.0协议):100,000条提示词(Singh等人,2024) - [WildChat GPT-4](https://huggingface.co/datasets/allenai/WildChat-1M)(采用ODC-BY-1.0协议):100,000条提示词(Zhao等人,2024) - [TableGPT](https://huggingface.co/datasets/LipengCS/Table-GPT)(采用MIT协议):5,000条提示词(Zha等人,2023) - [SciRIFF](https://huggingface.co/datasets/allenai/SciRIFF)(采用ODC-BY-1.0协议):10,000条提示词(Wadden等人,2024) - [Evol CodeAlpaca](https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1)(采用Apache 2.0协议):107,276条提示词(Luo等人,2023) ## 数据集结构 本数据集的每条样本均包含标准的指令微调(Instruction Tuning)数据格式,具体如下: - `id`(字符串类型):唯一标识符 - `messages`(列表类型):用于监督微调的消息格式(包含用户提示词与助手回复) - `source`(字符串类型):当前样本所属的源数据集 ## 模型家族 | **阶段** | **Llama 3.1 8B** | **Llama 3.1 70B** | |----------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------| | **基础模型** | [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) | [meta-llama/Llama-3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B) | | **监督微调(SFT)** | [allenai/Llama-3.1-Tulu-3-8B-SFT](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-SFT) | [allenai/Llama-3.1-Tulu-3-70B-SFT](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B-SFT) | | **偏好对齐(DPO,Direct Preference Optimization)** | [allenai/Llama-3.1-Tulu-3-8B-DPO](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-DPO) | [allenai/Llama-3.1-Tulu-3-70B-DPO](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B-DPO) | | **最终模型(RLVR)** | [allenai/Llama-3.1-Tulu-3-8B](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) | [allenai/Llama-3.1-Tulu-3-70B](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B) | | **奖励模型(Reward Model, RM)**| [allenai/Llama-3.1-Tulu-3-8B-RM](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-RM) | (与8B版本一致) | ## 授权协议 本数据集采用ODC-BY-1.0协议授权,仅可用于研究与教育用途,并需遵循艾伦人工智能研究所(Allen Institute for AI, Ai2)的[负责任使用指南](https://allenai.org/responsible-use)。本数据集包含由第三方模型生成的输出数据,此类数据受其自身独立使用条款约束。如需了解详细授权与使用条款,请查阅上文链接的各子集页面。 ## 引用 若Tülu3或其相关材料对你的研究有所帮助,请引用如下文献: @article{lambert2024tulu3, title = {Tülu 3: Pushing Frontiers in Open Language Model Post-Training}, author = { Nathan Lambert and Jacob Morrison and Valentina Pyatkin and Shengyi Huang and Hamish Ivison and Faeze Brahman and Lester James V. Miranda and Alisa Liu and Nouha Dziri and Shane Lyu and Yuling Gu and Saumya Malik and Victoria Graf and Jena D. Hwang and Jiangjiang Yang and Ronan Le Bras and Oyvind Tafjord and Chris Wilhelm and Luca Soldaini and Noah A. Smith and Yizhong Wang and Pradeep Dasigi and Hannaneh Hajishirzi }, year = {2024}, email = {tulu@allenai.org} }
提供机构:
maas
创建时间:
2025-05-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作