Dolci-Think-SFT-32B

Name: Dolci-Think-SFT-32B
Creator: maas
Published: 2026-01-06 16:52:49
License: 暂无描述

魔搭社区2026-01-06 更新2025-12-06 收录

下载链接：

https://modelscope.cn/datasets/allenai/Dolci-Think-SFT-32B

下载链接

链接失效反馈

官方服务：

资源简介：

# Dolci-Think-SFT Sources include a mixture of existing reasoning traces: * [OpenThoughts 3](https://huggingface.co/datasets/open-thoughts/OpenThoughts3-1.2M) (Apache 2.0): Extended to 32K context length and downsampled code prompts to 16X multiple, to 941,164 total prompts. Access our version, Dolci OpenThoughts 3 here. * [SYNTHETIC-2](https://huggingface.co/datasets/PrimeIntellect/SYNTHETIC-2-SFT-verified) (Apache 2.0) via the SFT-Verified split, 104,568 prompts. * [Nemotron Post-training dataset](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v1) (CC BY 4), code split only, 113,777 prompts. New prompts and new reasoning traces from us (all ODC-BY-1.0): * Dolci Think Persona IF: New precise instruction following prompts and traces created with [Nvidia's Nemotron Post-training Personas](https://huggingface.co/datasets/nvidia/Nemotron-Personas-USA). 220,530 prompts. * Dolci Precise IF: New multi-constraint instruction following data building off Pyatkin, Valentina, et al. "[Generalizing Verifiable Instruction Following](https://arxiv.org/abs/2507.02833)." (2025). 135,722 prompts. * [Dolci Think Python](https://huggingface.co/datasets/allenai/Dolci-Think-SFT-Python): 466,676 prompts (subsampled from larger mix). Existing prompts with new reasoning traces, largely repurposed from Tülu 3 / OLMo 2, with new traces generated by a mix of DeepSeek R1 and DeepSeek R1 0528: * [WildChat](https://huggingface.co/datasets/allenai/WildChat-1M) (ODC-BY-1.0), 76,209 prompts. * [OpenAssistant Guanaco](https://huggingface.co/datasets/OpenAssistant/oasst1) (Apache 2.0), 6,647 prompts. * [CoCoNot](https://huggingface.co/datasets/allenai/coconot) (ODC-BY-1.0), 9,549 prompts. * [WildGuardMix ](https://huggingface.co/datasets/allenai/wildguardmix) (Apache 2.0), 36,673 prompts. * [WildJailbreak](https://huggingface.co/datasets/allenai/wildjailbreak) (ODC-BY-1.0) 40,002 prompts. * [Aya](https://huggingface.co/datasets/CohereForAI/aya_dataset) (Apache 2.0), 97,156 prompts. * [TableGPT](https://huggingface.co/datasets/LipengCS/Table-GPT) (MIT), 4,973 prompts. * Olmo Identity Prompts, 58 samples (we trained with 290, 5 repetitions per prompt, uploaded single repetition to HuggingFace) The counts are smaller than the original prompt sources pulled from Tülu 3 / OLMo 2 due to more extensive filtering for data quality and by topics within the Azure API (blocked requests). This dataset was used for 32B post-training, the [7B dataset](https://huggingface.co/datasets/allenai/Dolci-Think-SFT-7B) is slightly different. ## Dataset Structure Each example in the dataset contains the standard instruction-tuning data points as follow: - `id` (str): a unique identifier - `messages` (list): message format used for supervised fine-tuning (this contains user prompt and assistant responses) - `source` (str): the source dataset for the given sample Every datapoint contains the model's reasoning in `<think>...</think>` and NO `<answer>...</answer>` tags -- the answer follows directly after `</think>`. ## Model Family | **Stage** | **Olmo 3 7B Think** | **Olmo 3 32B Think** | **Olmo 3 7B Instruct** | |--------------------------|-----------------------|------------------------|---------------------------| | **Base Model** | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | [Olmo-3-32B](https://huggingface.co/allenai/Olmo-3-1125-32B) | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | | **SFT** | [Olmo-3-7B-Think-SFT](https://huggingface.co/allenai/Olmo-3-7B-Think-SFT) | [Olmo-3-32B-Think-SFT](https://huggingface.co/allenai/Olmo-3-32B-Think-SFT) | [Olmo-3-7B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3-7B-Instruct-SFT) | | **DPO** | [Olmo-3-7B-Think-DPO](https://huggingface.co/allenai/Olmo-3-7B-Think-DPO) | [Olmo-3-32B-Think-DPO](https://huggingface.co/allenai/Olmo-3-32B-Think-DPO) | [Olmo-3-7B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3-7B-Instruct-DPO) | | **Final Models (RLVR)** | [Olmo-3-7B-Think](https://huggingface.co/allenai/Olmo-3-7B-Think) | [Olmo-3-32B-Think](https://huggingface.co/allenai/Olmo-3-32B-Think) | [Olmo-3-7B-Instruct](https://huggingface.co/allenai/Olmo-3-7B-Instruct) | ## License This dataset is licensed under ODC-BY. It is intended for research and educational use in accordance with Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use). ## Citation ``` @misc{olmo2025olmo3, title={Olmo 3}, author={Team Olmo and Allyson Ettinger and Amanda Bertsch and Bailey Kuehl and David Graham and David Heineman and Dirk Groeneveld and Faeze Brahman and Finbarr Timbers and Hamish Ivison and Jacob Morrison and Jake Poznanski and Kyle Lo and Luca Soldaini and Matt Jordan and Mayee Chen and Michael Noukhovitch and Nathan Lambert and Pete Walsh and Pradeep Dasigi and Robert Berry and Saumya Malik and Saurabh Shah and Scott Geng and Shane Arora and Shashank Gupta and Taira Anderson and Teng Xiao and Tyler Murray and Tyler Romero and Victoria Graf and Akari Asai and Akshita Bhagia and Alexander Wettig and Alisa Liu and Aman Rangapur and Chloe Anastasiades and Costa Huang and Dustin Schwenk and Harsh Trivedi and Ian Magnusson and Jaron Lochner and Jiacheng Liu and Lester James V. Miranda and Maarten Sap and Malia Morgan and Michael Schmitz and Michal Guerquin and Michael Wilson and Regan Huff and Ronan Le Bras and Rui Xin and Rulin Shao and Sam Skjonsberg and Shannon Zejiang Shen and Shuyue Stella Li and Tucker Wilde and Valentina Pyatkin and Will Merrill and Yapei Chang and Yuling Gu and Zhiyuan Zeng and Ashish Sabharwal and Luke Zettlemoyer and Pang Wei Koh and Ali Farhadi and Noah A. Smith and Hannaneh Hajishirzi}, year={2025}, eprint={2512.13961}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2512.13961}, } ```

# Dolci-Think-SFT 本数据集的来源包含多种现有推理轨迹： * [OpenThoughts 3](https://huggingface.co/datasets/open-thoughts/OpenThoughts3-1.2M)（Apache 2.0协议）：该数据集已被扩展至32K上下文长度，并将代码提示进行16倍下采样，最终总提示数达941,164条。可获取本项目的OpenThoughts 3版本。 * [SYNTHETIC-2](https://huggingface.co/datasets/PrimeIntellect/SYNTHETIC-2-SFT-verified)（Apache 2.0协议）：采用其SFT-Verified划分，包含104,568条提示。 * [Nemotron Post-training dataset](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v1)（CC BY 4协议）：仅采用其代码划分，包含113,777条提示。本项目原创的提示与推理轨迹（均采用ODC-BY-1.0协议）： * Dolci Think Persona IF：基于[Nvidia 的 Nemotron 后训练人设数据集](https://huggingface.co/datasets/nvidia/Nemotron-Personas-USA)构建的新型精准指令遵循提示与推理轨迹，共包含220,530条提示。 * Dolci Precise IF：基于Pyatkin, Valentina等人于2025年发表的论文《[Generalizing Verifiable Instruction Following](https://arxiv.org/abs/2507.02833)》构建的新型多约束指令遵循数据集，共包含135,722条提示。 * [Dolci Think Python](https://huggingface.co/datasets/allenai/Dolci-Think-SFT-Python)：包含466,676条提示（从更大的混合数据集中下采样得到）。基于现有提示并新增推理轨迹：大部分数据源自Tülu 3 / OLMo 2，新增的推理轨迹由DeepSeek R1与DeepSeek R1 0528联合生成： * [WildChat](https://huggingface.co/datasets/allenai/WildChat-1M)（ODC-BY-1.0协议），包含76,209条提示。 * [OpenAssistant Guanaco](https://huggingface.co/datasets/OpenAssistant/oasst1)（Apache 2.0协议），包含6,647条提示。 * [CoCoNot](https://huggingface.co/datasets/allenai/coconot)（ODC-BY-1.0协议），包含9,549条提示。 * [WildGuardMix](https://huggingface.co/datasets/allenai/wildguardmix)（Apache 2.0协议），包含36,673条提示。 * [WildJailbreak](https://huggingface.co/datasets/allenai/wildjailbreak)（ODC-BY-1.0协议），包含40,002条提示。 * [Aya](https://huggingface.co/datasets/CohereForAI/aya_dataset)（Apache 2.0协议），包含97,156条提示。 * [TableGPT](https://huggingface.co/datasets/LipengCS/Table-GPT)（MIT协议），包含4,973条提示。 * Olmo Identity Prompts：共58条样本（实际训练时使用了290条，即每条提示重复5次，本次上传至HuggingFace的版本为单份重复样本）。由于针对数据质量与Azure API内的主题（含被拦截的请求）进行了更严格的筛选，本数据集的提示总数低于从Tülu 3 / OLMo 2中直接获取的原始源数据规模。本数据集用于32B参数模型的后训练，[7B参数版本数据集](https://huggingface.co/datasets/allenai/Dolci-Think-SFT-7B)存在细微差异。 ## 数据集结构数据集中的每条样本均包含标准的指令微调数据格式，具体如下： - `id`（字符串类型）：唯一标识符 - `messages`（列表类型）：用于监督微调的消息格式（包含用户提示与助手回复） - `source`（字符串类型）：当前样本所属的源数据集每条样本均包含模型的推理过程，该过程包裹在`<think>...</think>`标签内，且不包含`<answer>...</answer>`标签——答案紧随`</think>`之后输出。 ## 模型家族 | **训练阶段** | **Olmo 3 7B Think** | **Olmo 3 32B Think** | **Olmo 3 7B 指令微调** | |--------------------------|-----------------------|------------------------|---------------------------| | **基础模型** | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | [Olmo-3-32B](https://huggingface.co/allenai/Olmo-3-1125-32B) | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | | **监督微调（SFT）** | [Olmo-3-7B-Think-SFT](https://huggingface.co/allenai/Olmo-3-7B-Think-SFT) | [Olmo-3-32B-Think-SFT](https://huggingface.co/allenai/Olmo-3-32B-Think-SFT) | [Olmo-3-7B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3-7B-Instruct-SFT) | | **偏好优化（DPO）** | [Olmo-3-7B-Think-DPO](https://huggingface.co/allenai/Olmo-3-7B-Think-DPO) | [Olmo-3-32B-Think-DPO](https://huggingface.co/allenai/Olmo-3-32B-Think-DPO) | [Olmo-3-7B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3-7B-Instruct-DPO) | | **最终模型（RLVR）** | [Olmo-3-7B-Think](https://huggingface.co/allenai/Olmo-3-7B-Think) | [Olmo-3-32B-Think](https://huggingface.co/allenai/Olmo-3-32B-Think) | [Olmo-3-7B-Instruct](https://huggingface.co/allenai/Olmo-3-7B-Instruct) | ## 授权协议本数据集采用ODC-BY协议进行授权，仅可用于研究与教育用途，并需遵循Ai2发布的[负责任使用指南](https://allenai.org/responsible-use)。 ## 引用格式 @misc{olmo2025olmo3, title={Olmo 3}, author={Team Olmo and Allyson Ettinger and Amanda Bertsch and Bailey Kuehl and David Graham and David Heineman and Dirk Groeneveld and Faeze Brahman and Finbarr Timbers and Hamish Ivison and Jacob Morrison and Jake Poznanski and Kyle Lo and Luca Soldaini and Matt Jordan and Mayee Chen and Michael Noukhovitch and Nathan Lambert and Pete Walsh and Pradeep Dasigi and Robert Berry and Saumya Malik and Saurabh Shah and Scott Geng and Shane Arora and Shashank Gupta and Taira Anderson and Teng Xiao and Tyler Murray and Tyler Romero and Victoria Graf and Akari Asai and Akshita Bhagia and Alexander Wettig and Alisa Liu and Aman Rangapur and Chloe Anastasiades and Costa Huang and Dustin Schwenk and Harsh Trivedi and Ian Magnusson and Jaron Lochner and Jiacheng Liu and Lester James V. Miranda and Maarten Sap and Malia Morgan and Michael Schmitz and Michal Guerquin and Michael Wilson and Regan Huff and Ronan Le Bras and Rui Xin and Rulin Shao and Sam Skjonsberg and Shannon Zejiang Shen and Shuyue Stella Li and Tucker Wilde and Valentina Pyatkin and Will Merrill and Yapei Chang and Yuling Gu and Zhiyuan Zeng and Ashish Sabharwal and Luke Zettlemoyer and Pang Wei Koh and Ali Farhadi and Noah A. Smith and Hannaneh Hajishirzi}, year={2025}, eprint={2512.13961}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2512.13961}, }

提供机构：

maas

创建时间：

2025-11-30

5,000+

优质数据集

54 个

任务类型

进入经典数据集