Dolci-Think-SFT
收藏魔搭社区2025-12-03 更新2025-11-22 收录
下载链接:
https://modelscope.cn/datasets/allenai/Dolci-Think-SFT
下载链接
链接失效反馈官方服务:
资源简介:
# Dolci-Think-SFT
Sources include a mixture of existing reasoning traces:
* [OpenThoughts 3](https://huggingface.co/datasets/open-thoughts/OpenThoughts3-1.2M) (Apache 2.0): Extended to 32K context length and downsampled code prompts to 16X multiple, to 941,164 total prompts. Access our version, Dolci OpenThoughts 3 here.
* [SYNTHETIC-2](https://huggingface.co/datasets/PrimeIntellect/SYNTHETIC-2-SFT-verified) (Apache 2.0) via the SFT-Verified split, 104,568 prompts.
* [Nemotron Post-training dataset](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v1) (CC BY 4), code split only, 113,777 prompts.
New prompts and new reasoning traces from us (all ODC-BY-1.0):
* Dolci Think Persona IF: New precise instruction following prompts and traces created with [Nvidia's Nemotron Post-training Personas](https://huggingface.co/datasets/nvidia/Nemotron-Personas-USA). 220,530 prompts.
* Dolci Precise IF: New multi-constraint instruction following data building off Pyatkin, Valentina, et al. "[Generalizing Verifiable Instruction Following](https://arxiv.org/abs/2507.02833)." (2025). 135,722 prompts.
* [Dolci Think Python](https://huggingface.co/datasets/allenai/Dolci-Think-SFT-Python): 466,676 prompts (subsampled from larger mix).
Existing prompts with new reasoning traces, largely repurposed from Tülu 3 / OLMo 2, with new traces generated by a mix of DeepSeek R1 and DeepSeek R1 0528:
* [WildChat](https://huggingface.co/datasets/allenai/WildChat-1M) (ODC-BY-1.0), 76,209 prompts.
* [OpenAssistant Guanaco](https://huggingface.co/datasets/OpenAssistant/oasst1) (Apache 2.0), 6,647 prompts.
* [CoCoNot](https://huggingface.co/datasets/allenai/coconot) (ODC-BY-1.0), 9,549 prompts.
* [WildGuardMix ](https://huggingface.co/datasets/allenai/wildguardmix) (Apache 2.0), 36,673 prompts.
* [WildJailbreak](https://huggingface.co/datasets/allenai/wildjailbreak) (ODC-BY-1.0) 40,002 prompts.
* [Aya](https://huggingface.co/datasets/CohereForAI/aya_dataset) (Apache 2.0), 97,156 prompts.
* [TableGPT](https://huggingface.co/datasets/LipengCS/Table-GPT) (MIT), 4,973 prompts.
* Olmo Identity Prompts, 58 samples (we trained with 290, 5 repetitions per prompt, uploaded single repetition to HuggingFace)
The counts are smaller than the original prompt sources pulled from Tülu 3 / OLMo 2 due to more extensive filtering for data quality and by topics within the Azure API (blocked requests).
This dataset was used for 32B post-training, the [7B dataset](https://huggingface.co/datasets/allenai/Dolci-Think-SFT-7B) is slightly different.
## Dataset Structure
Each example in the dataset contains the standard instruction-tuning data points as follow:
- `id` (str): a unique identifier
- `messages` (list): message format used for supervised fine-tuning (this contains user prompt and assistant responses)
- `source` (str): the source dataset for the given sample
Every datapoint contains the model's reasoning in `<think>...</think>` and NO `<answer>...</answer>` tags -- the answer follows directly after `</think>`.
## Model Family
| **Stage** | **Olmo 3 7B Think** | **Olmo 3 32B Think** | **Olmo 3 7B Instruct** |
|--------------------------|-----------------------|------------------------|---------------------------|
| **Base Model** | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | [Olmo-3-32B](https://huggingface.co/allenai/Olmo-3-1125-32B) | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) |
| **SFT** | [Olmo-3-7B-Think-SFT](https://huggingface.co/allenai/Olmo-3-7B-Think-SFT) | [Olmo-3-32B-Think-SFT](https://huggingface.co/allenai/Olmo-3-32B-Think-SFT) | [Olmo-3-7B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3-7B-Instruct-SFT) |
| **DPO** | [Olmo-3-7B-Think-DPO](https://huggingface.co/allenai/Olmo-3-7B-Think-DPO) | [Olmo-3-32B-Think-DPO](https://huggingface.co/allenai/Olmo-3-32B-Think-DPO) | [Olmo-3-7B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3-7B-Instruct-DPO) |
| **Final Models (RLVR)** | [Olmo-3-7B-Think](https://huggingface.co/allenai/Olmo-3-7B-Think) | [Olmo-3-32B-Think](https://huggingface.co/allenai/Olmo-3-32B-Think) | [Olmo-3-7B-Instruct](https://huggingface.co/allenai/Olmo-3-7B-Instruct) |
## License
This dataset is licensed under ODC-BY. It is intended for research and educational use in accordance with Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use).
## Citation
Coming soon. For now, see our [technical report](https://allenai.org/olmo3.pdf).
# Dolci-Think-SFT 监督微调数据集
数据集来源包含多种现有推理轨迹:
* [OpenThoughts 3](https://huggingface.co/datasets/open-thoughts/OpenThoughts3-1.2M)(Apache 2.0协议):将上下文长度拓展至32K,并将代码提示下采样至原规模的1/16,最终总提示数达941,164条。可在此获取我们整理的版本:Dolci OpenThoughts 3。
* [SYNTHETIC-2](https://huggingface.co/datasets/PrimeIntellect/SYNTHETIC-2-SFT-verified)(Apache 2.0协议):采用其SFT-Verified划分,共包含104,568条提示。
* [Nemotron 后训练数据集](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v1)(CC BY 4协议):仅采用其代码划分部分,共113,777条提示。
以下为我们原创的提示与推理轨迹(均采用ODC-BY-1.0协议):
* Dolci Think Persona IF:基于[Nvidia Nemotron 后训练人设数据集](https://huggingface.co/datasets/nvidia/Nemotron-Personas-USA)构建的高精度指令遵循提示与推理轨迹,共220,530条提示。
* Dolci Precise IF:基于Pyatkin, Valentina等人于2025年发表的论文《Generalizing Verifiable Instruction Following》(https://arxiv.org/abs/2507.02833)构建的多约束指令遵循数据集,共135,722条提示。
* [Dolci Think Python](https://huggingface.co/datasets/allenai/Dolci-Think-SFT-Python):共466,676条提示(从更大规模的混合数据集中下采样得到)。
以下为基于现有提示构建的新推理轨迹,主要源自Tülu 3与OLMo 2数据集,新轨迹由DeepSeek R1与DeepSeek R1 0528联合生成:
* [WildChat](https://huggingface.co/datasets/allenai/WildChat-1M)(ODC-BY-1.0协议):共76,209条提示。
* [OpenAssistant Guanaco](https://huggingface.co/datasets/OpenAssistant/oasst1)(Apache 2.0协议):共6,647条提示。
* [CoCoNot](https://huggingface.co/datasets/allenai/coconot)(ODC-BY-1.0协议):共9,549条提示。
* [WildGuardMix](https://huggingface.co/datasets/allenai/wildguardmix)(Apache 2.0协议):共36,673条提示。
* [WildJailbreak](https://huggingface.co/datasets/allenai/wildjailbreak)(ODC-BY-1.0协议):共40,002条提示。
* [Aya](https://huggingface.co/datasets/CohereForAI/aya_dataset)(Apache 2.0协议):共97,156条提示。
* [TableGPT](https://huggingface.co/datasets/LipengCS/Table-GPT)(MIT协议):共4,973条提示。
* Olmo 身份提示:共58条样本(我们训练时采用了290条样本,每条提示重复5次,仅上传单份重复至Hugging Face平台)。
由于针对数据质量与Azure API中被拦截请求对应的主题进行了更严格的筛选,本数据集的提示数少于从Tülu 3与OLMo 2中直接提取的原始提示源规模。
本数据集用于32B参数模型的后训练,[7B参数版本数据集](https://huggingface.co/datasets/allenai/Dolci-Think-SFT-7B)存在细微差异。
## 数据集结构
数据集中的每条样本均包含标准的监督微调(Supervised Fine-Tuning, SFT)数据字段,具体如下:
- `id`(字符串类型):唯一标识符
- `messages`(列表类型):用于监督微调的对话格式字段,包含用户提示与助手回复
- `source`(字符串类型):当前样本所属的源数据集
每条样本均包含模型的推理过程,该过程包裹在`<think>...</think>`标签中,且无`<answer>...</answer>`标签,模型的最终答案紧随`</think>`之后。
## 模型家族
| **训练阶段** | **Olmo 3 7B Think** | **Olmo 3 32B Think** | **Olmo 3 7B Instruct** |
|--------------------------|-----------------------|------------------------|---------------------------|
| **基础模型** | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | [Olmo-3-32B](https://huggingface.co/allenai/Olmo-3-1125-32B) | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) |
| **监督微调(SFT)** | [Olmo-3-7B-Think-SFT](https://huggingface.co/allenai/Olmo-3-7B-Think-SFT) | [Olmo-3-32B-Think-SFT](https://huggingface.co/allenai/Olmo-3-32B-Think-SFT) | [Olmo-3-7B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3-7B-Instruct-SFT) |
| **直接偏好优化(DPO)** | [Olmo-3-7B-Think-DPO](https://huggingface.co/allenai/Olmo-3-7B-Think-DPO) | [Olmo-3-32B-Think-DPO](https://huggingface.co/allenai/Olmo-3-32B-Think-DPO) | [Olmo-3-7B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3-7B-Instruct-DPO) |
| **最终模型(RLVR)** | [Olmo-3-7B-Think](https://huggingface.co/allenai/Olmo-3-7B-Think) | [Olmo-3-32B-Think](https://huggingface.co/allenai/Olmo-3-32B-Think) | [Olmo-3-7B-Instruct](https://huggingface.co/allenai/Olmo-3-7B-Instruct) |
## 许可证
本数据集采用ODC-BY协议进行许可,仅可用于研究与教育用途,请遵循Ai2发布的[负责任使用指南](https://allenai.org/responsible-use)。
## 引用信息
引用信息即将发布,目前可参考我们的[技术报告](https://allenai.org/olmo3.pdf)。
提供机构:
maas
创建时间:
2025-11-21



