Dolci-Think-SFT-7B
收藏魔搭社区2026-01-08 更新2025-11-22 收录
下载链接:
https://modelscope.cn/datasets/allenai/Dolci-Think-SFT-7B
下载链接
链接失效反馈官方服务:
资源简介:
# Dolci-Think-SFT
Sources include a mixture of existing reasoning traces:
* [OpenThoughts 3](https://huggingface.co/datasets/open-thoughts/OpenThoughts3-1.2M) (Apache 2.0): Extended to 32K context length and downsampled code prompts to 16X multiple, to 941,166 total prompts. Access our version, Dolci OpenThoughts 3 here.
* [SYNTHETIC-2](https://huggingface.co/datasets/PrimeIntellect/SYNTHETIC-2-SFT-verified) (Apache 2.0) via the SFT-Verified split, 104,569 prompts.
* [Nemotron Post-training dataset](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v1) (CC BY 4), code split only, 113,777 prompts.
New prompts and new reasoning traces from us (all ODC-BY-1.0):
* Dolci Think Persona IF: New precise instruction following prompts and traces created with [Nvidia's Nemotron Post-training Personas](https://huggingface.co/datasets/nvidia/Nemotron-Personas-USA). 223,123 prompts.
* Dolci Precise IF: New multi-constraint instruction following data building off Pyatkin, Valentina, et al. "[Generalizing Verifiable Instruction Following](https://arxiv.org/abs/2507.02833)." (2025). 135,792 prompts.
* [Dolci Think Python](https://huggingface.co/datasets/allenai/Dolci-Think-SFT-Python): 466,677 prompts (subsampled from larger mix).
Existing prompts with new reasoning traces, largely repurposed from Tülu 3 / OLMo 2, with new traces generated by a mix of DeepSeek R1 and DeepSeek R1 0528:
* [WildChat](https://huggingface.co/datasets/allenai/WildChat-1M) (ODC-BY-1.0), 83,054 prompts.
* [OpenAssistant Guanaco](https://huggingface.co/datasets/OpenAssistant/oasst1) (Apache 2.0), 6,800 prompts.
* [CoCoNot](https://huggingface.co/datasets/allenai/coconot) (ODC-BY-1.0), 10,227 prompts.
* [WildGuardMix ](https://huggingface.co/datasets/allenai/wildguardmix) (Apache 2.0), 38,315 prompts.
* [WildJailbreak](https://huggingface.co/datasets/allenai/wildjailbreak) (ODC-BY-1.0) 41,100 prompts.
* [Aya](https://huggingface.co/datasets/CohereForAI/aya_dataset) (Apache 2.0), 98,597 prompts.
* [TableGPT](https://huggingface.co/datasets/LipengCS/Table-GPT) (MIT), 4,981 prompts.
* Olmo Identity Prompts, 58 samples (we trained with 290, 5 repetitions per prompt, uploaded single repetition to HuggingFace)
The counts are smaller than the original prompt sources pulled from Tülu 3 / OLMo 2 due to more extensive filtering for data quality and by topics within the Azure API (blocked requests).
This dataset was used for 7B post-training, the [7B dataset](https://huggingface.co/datasets/allenai/Dolci-Think-SFT) is slightly different.
## Dataset Structure
Each example in the dataset contains the standard instruction-tuning data points as follow:
- `id` (str): a unique identifier
- `messages` (list): message format used for supervised fine-tuning (this contains user prompt and assistant responses)
- `source` (str): the source dataset for the given sample
Every datapoint contains the model's reasoning in `<think>...</think>` and NO `<answer>...</answer>` tags -- the answer follows directly after `</think>`.
## Model Family
| **Stage** | **Olmo 3 7B Think** | **Olmo 3 32B Think** | **Olmo 3 7B Instruct** |
|--------------------------|-----------------------|------------------------|---------------------------|
| **Base Model** | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | [Olmo-3-32B](https://huggingface.co/allenai/Olmo-3-1125-32B) | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) |
| **SFT** | [Olmo-3-7B-Think-SFT](https://huggingface.co/allenai/Olmo-3-7B-Think-SFT) | [Olmo-3-32B-Think-SFT](https://huggingface.co/allenai/Olmo-3-32B-Think-SFT) | [Olmo-3-7B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3-7B-Instruct-SFT) |
| **DPO** | [Olmo-3-7B-Think-DPO](https://huggingface.co/allenai/Olmo-3-7B-Think-DPO) | [Olmo-3-32B-Think-DPO](https://huggingface.co/allenai/Olmo-3-32B-Think-DPO) | [Olmo-3-7B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3-7B-Instruct-DPO) |
| **Final Models (RLVR)** | [Olmo-3-7B-Think](https://huggingface.co/allenai/Olmo-3-7B-Think) | [Olmo-3-32B-Think](https://huggingface.co/allenai/Olmo-3-32B-Think) | [Olmo-3-7B-Instruct](https://huggingface.co/allenai/Olmo-3-7B-Instruct) |
## License
This dataset is licensed under ODC-BY. It is intended for research and educational use in accordance with Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use).
## Citation
```
@misc{olmo2025olmo3,
title={Olmo 3},
author={Team Olmo and Allyson Ettinger and Amanda Bertsch and Bailey Kuehl and David Graham and David Heineman and Dirk Groeneveld and Faeze Brahman and Finbarr Timbers and Hamish Ivison and Jacob Morrison and Jake Poznanski and Kyle Lo and Luca Soldaini and Matt Jordan and Mayee Chen and Michael Noukhovitch and Nathan Lambert and Pete Walsh and Pradeep Dasigi and Robert Berry and Saumya Malik and Saurabh Shah and Scott Geng and Shane Arora and Shashank Gupta and Taira Anderson and Teng Xiao and Tyler Murray and Tyler Romero and Victoria Graf and Akari Asai and Akshita Bhagia and Alexander Wettig and Alisa Liu and Aman Rangapur and Chloe Anastasiades and Costa Huang and Dustin Schwenk and Harsh Trivedi and Ian Magnusson and Jaron Lochner and Jiacheng Liu and Lester James V. Miranda and Maarten Sap and Malia Morgan and Michael Schmitz and Michal Guerquin and Michael Wilson and Regan Huff and Ronan Le Bras and Rui Xin and Rulin Shao and Sam Skjonsberg and Shannon Zejiang Shen and Shuyue Stella Li and Tucker Wilde and Valentina Pyatkin and Will Merrill and Yapei Chang and Yuling Gu and Zhiyuan Zeng and Ashish Sabharwal and Luke Zettlemoyer and Pang Wei Koh and Ali Farhadi and Noah A. Smith and Hannaneh Hajishirzi},
year={2025},
eprint={2512.13961},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.13961},
}
```
# Dolci-Think-SFT
本数据集的来源包含多种现有推理轨迹:
* [OpenThoughts 3(OpenThoughts 3)](https://huggingface.co/datasets/open-thoughts/OpenThoughts3-1.2M)(Apache 2.0协议):该数据集已被扩展至32K上下文长度,并将代码提示按16倍比例下采样,最终总计941,166条提示。可在此处获取我们整理的版本:Dolci OpenThoughts 3。
* [SYNTHETIC-2(SYNTHETIC-2)](https://huggingface.co/datasets/PrimeIntellect/SYNTHETIC-2-SFT-verified)(Apache 2.0协议):采用其SFT-Verified划分子集,包含104,569条提示。
* [Nemotron 后训练数据集(Nemotron Post-training dataset)](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v1)(CC BY 4协议):仅采用其代码划分子集,包含113,777条提示。
本团队新增的提示与推理轨迹(均采用ODC-BY-1.0协议):
* Dolci Think Persona IF:基于[Nvidia Nemotron 后训练角色集(Nvidia's Nemotron Post-training Personas)](https://huggingface.co/datasets/nvidia/Nemotron-Personas-USA)构建的精准指令遵循提示与推理轨迹,总计223,123条提示。
* Dolci Precise IF:基于Pyatkin, Valentina等人于2025年发表的论文《可验证指令遵循的泛化方法》(Generalizing Verifiable Instruction Following,https://arxiv.org/abs/2507.02833)构建的多约束指令遵循数据集,总计135,792条提示。
* [Dolci Think Python(Dolci Think Python)](https://huggingface.co/datasets/allenai/Dolci-Think-SFT-Python):从更大规模混合数据中采样得到的466,677条提示。
基于现有提示新增推理轨迹的子集,主要源自Tülu 3与OLMo 2数据集,新增推理轨迹由DeepSeek R1与DeepSeek R1 0528联合生成:
* [WildChat(WildChat)](https://huggingface.co/datasets/allenai/WildChat-1M)(ODC-BY-1.0协议):83,054条提示。
* [OpenAssistant Guanaco(OpenAssistant Guanaco)](https://huggingface.co/datasets/OpenAssistant/oasst1)(Apache 2.0协议):6,800条提示。
* [CoCoNot(CoCoNot)](https://huggingface.co/datasets/allenai/coconot)(ODC-BY-1.0协议):10,227条提示。
* [WildGuardMix(WildGuardMix)](https://huggingface.co/datasets/allenai/wildguardmix)(Apache 2.0协议):38,315条提示。
* [WildJailbreak(WildJailbreak)](https://huggingface.co/datasets/allenai/wildjailbreak)(ODC-BY-1.0协议):41,100条提示。
* [Aya(Aya)](https://huggingface.co/datasets/CohereForAI/aya_dataset)(Apache 2.0协议):98,597条提示。
* [TableGPT(TableGPT)](https://huggingface.co/datasets/LipengCS/Table-GPT)(MIT协议):4,981条提示。
* Olmo 身份提示集:原始包含58个样本(本团队实际训练时采用290个样本,每条提示重复5次,本次上传至Hugging Face的版本仅包含单份重复样本)。
由于本数据集针对数据质量与Azure API中被拦截请求对应的主题进行了更为严格的筛选,因此其样本数量少于从Tülu 3与OLMo 2中直接提取的原始提示源规模。
本数据集用于7B规模模型的后训练,[7B微调数据集](https://huggingface.co/datasets/allenai/Dolci-Think-SFT)存在细微差异。
## 数据集结构
本数据集的每条样本均包含标准的指令微调数据字段,具体如下:
- `id`(字符串类型):唯一标识符
- `messages`(列表类型):用于监督微调(Supervised Fine-Tuning, SFT)的消息格式(包含用户提示与助手回复)
- `source`(字符串类型):当前样本所属的源数据集
每条数据均包含模型的推理过程,其被包裹在`<think>...</think>`标签中,且不包含`<answer>...</answer>`标签——答案紧跟在`</think>`之后。
## 模型家族
| **训练阶段** | **Olmo 3 7B Think** | **Olmo 3 32B Think** | **Olmo 3 7B Instruct** |
|--------------------------|-----------------------|------------------------|---------------------------|
| **基础模型** | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | [Olmo-3-32B](https://huggingface.co/allenai/Olmo-3-1125-32B) | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) |
| **监督微调(SFT)** | [Olmo-3-7B-Think-SFT](https://huggingface.co/allenai/Olmo-3-7B-Think-SFT) | [Olmo-3-32B-Think-SFT](https://huggingface.co/allenai/Olmo-3-32B-Think-SFT) | [Olmo-3-7B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3-7B-Instruct-SFT) |
| **偏好优化(DPO)** | [Olmo-3-7B-Think-DPO](https://huggingface.co/allenai/Olmo-3-7B-Think-DPO) | [Olmo-3-32B-Think-DPO](https://huggingface.co/allenai/Olmo-3-32B-Think-DPO) | [Olmo-3-7B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3-7B-Instruct-DPO) |
| **最终模型(RLVR)** | [Olmo-3-7B-Think](https://huggingface.co/allenai/Olmo-3-7B-Think) | [Olmo-3-32B-Think](https://huggingface.co/allenai/Olmo-3-32B-Think) | [Olmo-3-7B-Instruct](https://huggingface.co/allenai/Olmo-3-7B-Instruct) |
## 授权协议
本数据集采用ODC-BY协议进行授权,仅可用于研究与教育用途,并需遵循Ai2发布的[负责任使用指南](https://allenai.org/responsible-use)。
## 引用
@misc{olmo2025olmo3,
title={Olmo 3},
author={Team Olmo and Allyson Ettinger and Amanda Bertsch and Bailey Kuehl and David Graham and David Heineman and Dirk Groeneveld and Faeze Brahman and Finbarr Timbers and Hamish Ivison and Jacob Morrison and Jake Poznanski and Kyle Lo and Luca Soldaini and Matt Jordan and Mayee Chen and Michael Noukhovitch and Nathan Lambert and Pete Walsh and Pradeep Dasigi and Robert Berry and Saumya Malik and Saurabh Shah and Scott Geng and Shane Arora and Shashank Gupta and Taira Anderson and Teng Xiao and Tyler Murray and Tyler Romero and Victoria Graf and Akari Asai and Akshita Bhagia and Alexander Wettig and Alisa Liu and Aman Rangapur and Chloe Anastasiades and Costa Huang and Dustin Schwenk and Harsh Trivedi and Ian Magnusson and Jaron Lochner and Jiacheng Liu and Lester James V. Miranda and Maarten Sap and Malia Morgan and Michael Schmitz and Michal Guerquin and Michael Wilson and Regan Huff and Ronan Le Bras and Rui Xin and Rulin Shao and Sam Skjonsberg and Shannon Zejiang Shen and Shuyue Stella Li and Tucker Wilde and Valentina Pyatkin and Will Merrill and Yapei Chang and Yuling Gu and Zhiyuan Zeng and Ashish Sabharwal and Luke Zettlemoyer and Pang Wei Koh and Ali Farhadi and Noah A. Smith and Hannaneh Hajishirzi},
year={2025},
eprint={2512.13961},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.13961},
}
提供机构:
maas
创建时间:
2025-11-21



