Dolci-Think-SFT

Name: Dolci-Think-SFT
Creator: maas
Published: 2025-12-03 17:29:36
License: 暂无描述

魔搭社区2025-12-03 更新2025-11-22 收录

下载链接：

https://modelscope.cn/datasets/allenai/Dolci-Think-SFT

下载链接

链接失效反馈

官方服务：

资源简介：

# Dolci-Think-SFT Sources include a mixture of existing reasoning traces: * [OpenThoughts 3](https://huggingface.co/datasets/open-thoughts/OpenThoughts3-1.2M) (Apache 2.0): Extended to 32K context length and downsampled code prompts to 16X multiple, to 941,164 total prompts. Access our version, Dolci OpenThoughts 3 here. * [SYNTHETIC-2](https://huggingface.co/datasets/PrimeIntellect/SYNTHETIC-2-SFT-verified) (Apache 2.0) via the SFT-Verified split, 104,568 prompts. * [Nemotron Post-training dataset](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v1) (CC BY 4), code split only, 113,777 prompts. New prompts and new reasoning traces from us (all ODC-BY-1.0): * Dolci Think Persona IF: New precise instruction following prompts and traces created with [Nvidia's Nemotron Post-training Personas](https://huggingface.co/datasets/nvidia/Nemotron-Personas-USA). 220,530 prompts. * Dolci Precise IF: New multi-constraint instruction following data building off Pyatkin, Valentina, et al. "[Generalizing Verifiable Instruction Following](https://arxiv.org/abs/2507.02833)." (2025). 135,722 prompts. * [Dolci Think Python](https://huggingface.co/datasets/allenai/Dolci-Think-SFT-Python): 466,676 prompts (subsampled from larger mix). Existing prompts with new reasoning traces, largely repurposed from Tülu 3 / OLMo 2, with new traces generated by a mix of DeepSeek R1 and DeepSeek R1 0528: * [WildChat](https://huggingface.co/datasets/allenai/WildChat-1M) (ODC-BY-1.0), 76,209 prompts. * [OpenAssistant Guanaco](https://huggingface.co/datasets/OpenAssistant/oasst1) (Apache 2.0), 6,647 prompts. * [CoCoNot](https://huggingface.co/datasets/allenai/coconot) (ODC-BY-1.0), 9,549 prompts. * [WildGuardMix ](https://huggingface.co/datasets/allenai/wildguardmix) (Apache 2.0), 36,673 prompts. * [WildJailbreak](https://huggingface.co/datasets/allenai/wildjailbreak) (ODC-BY-1.0) 40,002 prompts. * [Aya](https://huggingface.co/datasets/CohereForAI/aya_dataset) (Apache 2.0), 97,156 prompts. * [TableGPT](https://huggingface.co/datasets/LipengCS/Table-GPT) (MIT), 4,973 prompts. * Olmo Identity Prompts, 58 samples (we trained with 290, 5 repetitions per prompt, uploaded single repetition to HuggingFace) The counts are smaller than the original prompt sources pulled from Tülu 3 / OLMo 2 due to more extensive filtering for data quality and by topics within the Azure API (blocked requests). This dataset was used for 32B post-training, the [7B dataset](https://huggingface.co/datasets/allenai/Dolci-Think-SFT-7B) is slightly different. ## Dataset Structure Each example in the dataset contains the standard instruction-tuning data points as follow: - `id` (str): a unique identifier - `messages` (list): message format used for supervised fine-tuning (this contains user prompt and assistant responses) - `source` (str): the source dataset for the given sample Every datapoint contains the model's reasoning in `<think>...</think>` and NO `<answer>...</answer>` tags -- the answer follows directly after `</think>`. ## Model Family | **Stage** | **Olmo 3 7B Think** | **Olmo 3 32B Think** | **Olmo 3 7B Instruct** | |--------------------------|-----------------------|------------------------|---------------------------| | **Base Model** | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | [Olmo-3-32B](https://huggingface.co/allenai/Olmo-3-1125-32B) | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | | **SFT** | [Olmo-3-7B-Think-SFT](https://huggingface.co/allenai/Olmo-3-7B-Think-SFT) | [Olmo-3-32B-Think-SFT](https://huggingface.co/allenai/Olmo-3-32B-Think-SFT) | [Olmo-3-7B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3-7B-Instruct-SFT) | | **DPO** | [Olmo-3-7B-Think-DPO](https://huggingface.co/allenai/Olmo-3-7B-Think-DPO) | [Olmo-3-32B-Think-DPO](https://huggingface.co/allenai/Olmo-3-32B-Think-DPO) | [Olmo-3-7B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3-7B-Instruct-DPO) | | **Final Models (RLVR)** | [Olmo-3-7B-Think](https://huggingface.co/allenai/Olmo-3-7B-Think) | [Olmo-3-32B-Think](https://huggingface.co/allenai/Olmo-3-32B-Think) | [Olmo-3-7B-Instruct](https://huggingface.co/allenai/Olmo-3-7B-Instruct) | ## License This dataset is licensed under ODC-BY. It is intended for research and educational use in accordance with Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use). ## Citation Coming soon. For now, see our [technical report](https://allenai.org/olmo3.pdf).

# Dolci-Think-SFT 监督微调数据集数据集来源包含多种现有推理轨迹： * [OpenThoughts 3](https://huggingface.co/datasets/open-thoughts/OpenThoughts3-1.2M)（Apache 2.0协议）：将上下文长度拓展至32K，并将代码提示下采样至原规模的1/16，最终总提示数达941,164条。可在此获取我们整理的版本：Dolci OpenThoughts 3。 * [SYNTHETIC-2](https://huggingface.co/datasets/PrimeIntellect/SYNTHETIC-2-SFT-verified)（Apache 2.0协议）：采用其SFT-Verified划分，共包含104,568条提示。 * [Nemotron 后训练数据集](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v1)（CC BY 4协议）：仅采用其代码划分部分，共113,777条提示。以下为我们原创的提示与推理轨迹（均采用ODC-BY-1.0协议）： * Dolci Think Persona IF：基于[Nvidia Nemotron 后训练人设数据集](https://huggingface.co/datasets/nvidia/Nemotron-Personas-USA)构建的高精度指令遵循提示与推理轨迹，共220,530条提示。 * Dolci Precise IF：基于Pyatkin, Valentina等人于2025年发表的论文《Generalizing Verifiable Instruction Following》（https://arxiv.org/abs/2507.02833）构建的多约束指令遵循数据集，共135,722条提示。 * [Dolci Think Python](https://huggingface.co/datasets/allenai/Dolci-Think-SFT-Python)：共466,676条提示（从更大规模的混合数据集中下采样得到）。以下为基于现有提示构建的新推理轨迹，主要源自Tülu 3与OLMo 2数据集，新轨迹由DeepSeek R1与DeepSeek R1 0528联合生成： * [WildChat](https://huggingface.co/datasets/allenai/WildChat-1M)（ODC-BY-1.0协议）：共76,209条提示。 * [OpenAssistant Guanaco](https://huggingface.co/datasets/OpenAssistant/oasst1)（Apache 2.0协议）：共6,647条提示。 * [CoCoNot](https://huggingface.co/datasets/allenai/coconot)（ODC-BY-1.0协议）：共9,549条提示。 * [WildGuardMix](https://huggingface.co/datasets/allenai/wildguardmix)（Apache 2.0协议）：共36,673条提示。 * [WildJailbreak](https://huggingface.co/datasets/allenai/wildjailbreak)（ODC-BY-1.0协议）：共40,002条提示。 * [Aya](https://huggingface.co/datasets/CohereForAI/aya_dataset)（Apache 2.0协议）：共97,156条提示。 * [TableGPT](https://huggingface.co/datasets/LipengCS/Table-GPT)（MIT协议）：共4,973条提示。 * Olmo 身份提示：共58条样本（我们训练时采用了290条样本，每条提示重复5次，仅上传单份重复至Hugging Face平台）。由于针对数据质量与Azure API中被拦截请求对应的主题进行了更严格的筛选，本数据集的提示数少于从Tülu 3与OLMo 2中直接提取的原始提示源规模。本数据集用于32B参数模型的后训练，[7B参数版本数据集](https://huggingface.co/datasets/allenai/Dolci-Think-SFT-7B)存在细微差异。 ## 数据集结构数据集中的每条样本均包含标准的监督微调（Supervised Fine-Tuning, SFT）数据字段，具体如下： - `id`（字符串类型）：唯一标识符 - `messages`（列表类型）：用于监督微调的对话格式字段，包含用户提示与助手回复 - `source`（字符串类型）：当前样本所属的源数据集每条样本均包含模型的推理过程，该过程包裹在`<think>...</think>`标签中，且无`<answer>...</answer>`标签，模型的最终答案紧随`</think>`之后。 ## 模型家族 | **训练阶段** | **Olmo 3 7B Think** | **Olmo 3 32B Think** | **Olmo 3 7B Instruct** | |--------------------------|-----------------------|------------------------|---------------------------| | **基础模型** | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | [Olmo-3-32B](https://huggingface.co/allenai/Olmo-3-1125-32B) | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | | **监督微调（SFT）** | [Olmo-3-7B-Think-SFT](https://huggingface.co/allenai/Olmo-3-7B-Think-SFT) | [Olmo-3-32B-Think-SFT](https://huggingface.co/allenai/Olmo-3-32B-Think-SFT) | [Olmo-3-7B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3-7B-Instruct-SFT) | | **直接偏好优化（DPO）** | [Olmo-3-7B-Think-DPO](https://huggingface.co/allenai/Olmo-3-7B-Think-DPO) | [Olmo-3-32B-Think-DPO](https://huggingface.co/allenai/Olmo-3-32B-Think-DPO) | [Olmo-3-7B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3-7B-Instruct-DPO) | | **最终模型（RLVR）** | [Olmo-3-7B-Think](https://huggingface.co/allenai/Olmo-3-7B-Think) | [Olmo-3-32B-Think](https://huggingface.co/allenai/Olmo-3-32B-Think) | [Olmo-3-7B-Instruct](https://huggingface.co/allenai/Olmo-3-7B-Instruct) | ## 许可证本数据集采用ODC-BY协议进行许可，仅可用于研究与教育用途，请遵循Ai2发布的[负责任使用指南](https://allenai.org/responsible-use)。 ## 引用信息引用信息即将发布，目前可参考我们的[技术报告](https://allenai.org/olmo3.pdf)。

提供机构：

maas

创建时间：

2025-11-21

5,000+

优质数据集

54 个

任务类型

进入经典数据集