Superior-Reasoning-SFT-gpt-oss-120b

Name: Superior-Reasoning-SFT-gpt-oss-120b
Creator: maas
Published: 2026-05-17 01:16:52
License: 暂无描述

魔搭社区2026-05-17 更新2026-02-07 收录

下载链接：

https://modelscope.cn/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b

下载链接

链接失效反馈

官方服务：

资源简介：

# Superior-Reasoning-SFT-gpt-oss-120b <img src="assets/dasd-logo.png" alt="Ali" style="vertical-align: middle;"> [![GitHub](https://img.shields.io/badge/GitHub-DASD--Thinking-181717?logo=github&logoColor=white)](https://github.com/D2I-ai/dasd-thinking)  <a href="https://arxiv.org/abs/2601.09088" target="_blank"><img src="https://img.shields.io/badge/Technical Report-b5212f.svg?logo=arxiv" height="21px"></a> [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Checkpoint-DASD--4B--Thinking-yellow)](https://huggingface.co/Alibaba-Apsara/DASD-4B-Thinking)  [![ModelScope Model](https://img.shields.io/badge/🤖%20Checkpoint-DASD--4B--Thinking-624aff)](https://www.modelscope.cn/models/Alibaba-Apsara/DASD-4B-Thinking)  [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Checkpoint-DASD--30B--A3B--Thinking--Preview-yellow)](https://huggingface.co/Alibaba-Apsara/DASD-30B-A3B-Thinking-Preview)  [![ModelScope Model](https://img.shields.io/badge/🤖%20Checkpoint-DASD--30B--A3B--Thinking--Preview-624aff)](https://www.modelscope.cn/models/Alibaba-Apsara/DASD-30B-A3B-Thinking-Preview)  [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-Superior--Reasoning--SFT--gpt--oss--120b-red)](https://huggingface.co/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b)  [![ModelScope Model](https://img.shields.io/badge/🤖%20Dataset-Superior--Reasoning--SFT--gpt--oss--120b-124aff)](https://www.modelscope.cn/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b)  [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-Superior--Reasoning--SFT--gpt--oss--120b--Logprob-red)](https://huggingface.co/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b-Logprob)  [![ModelScope Model](https://img.shields.io/badge/🤖%20Dataset-Superior--Reasoning--SFT--gpt--oss--120b--Logprob-124aff)](https://www.modelscope.cn/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b-Logprob)  ## 🚀 Overview The **Superior-Reasoning-SFT-gpt-oss-120b** dataset is a high-quality, open-source collection containing **435K** samples designed to democratize the training of high-performance Long Chain-of-Thought (Long-CoT) models. Unlike standard distilled datasets that rely on random sampling or heuristic filtering, Superior-Reasoning-SFT-gpt-oss-120b is constructed using a principled **Distribution-Aligned Sequence Distillation** pipeline. This pipeline addresses key limitations in current reasoning distillation—specifically **distributional mismatch** and **exposure bias**, enabling small dense models (like **[DASD-4B-Thinking](https://huggingface.co/Alibaba-Apsara/DASD-4B-Thinking)**) to achieve State-of-the-Art performance among open-source models of comparable scale and outperforms significantly larger models. <div style="text-align: center;"> <img src="assets/size_4b-performance-img.jpg" alt="benchmark" style="width: 90%;"> </div> **Superior-Reasoning-SFT-gpt-oss-120b** contains the Stage 1 and Stage 2 SFT training data for DASD-4B-Thinking and can be used to reproduce the performance of DASD-4B-Thinking. ### Key Features * **Extreme Data Efficiency**: Contains only **435K** total samples, an order of magnitude smaller than comparable datasets, yet yields SOTA results. * **Diverse Domain Coverage**: Spans **Mathematics**, **Code Generation**, **Scientific Reasoning**, and **Instruction Following**. * **Teacher Provenance**: All data is distilled from **gpt-oss-120b** using its high‑reasoning mode, ensuring high-quality reasoning traces. ### Training Tip As detailed in the technical report, our training pipeline integrates three complementary mechanisms: - **Temperature-scheduled Learning**: combines low-temperature trajectories (stability) with higher-temperature trajectories (diversity and exploration), improving both learning stability and coverage of teacher's modes. - **Divergence-aware Sampling**: selects sequences based on teacher–student distributional mismatch at the sentence level and find target sequence-level distribution better supports effective learning. We also provide logprob of teacher models in our [Superior-Reasoning-SFT-gpt-oss-120b-Logprob](https://huggingface.co/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b-Logprob) Dataset. - **Mixed-policy Distillation**: introduces a lightweight on-policy revision protocol after off-policy SFT, using student-generated prefixes with teacher rewrites to improve robustness and mitigating exposure bias of distilled model Together, these components form a **coherent, distribution-aware sequence distillation pipeline** that improves the fidelity of reasoning transfer while remaining data-efficient, making strong long-reasoning behavior feasible in a 4B dense model. <div style="text-align: center;"> <img src="assets/pipeline.jpg" alt="DASD-Thinking training pipeline" style="width: 90%;"> </div> The **Superior-Reasoning-SFT-gpt-oss-120b** dataset comprises training samples used in Temperature-scheduled Learning, divided into two stages: - Low-Temperature Training (Stage 1, ~105k) - High-Temperature Training (Stage 2, ~330k). (Note: The training data for the mixed policy distillation stage is generated by the model itself after completing Stage 1 and Stage 2 training, and is therefore not included in this dataset.) All samples across both stages are generated via **Divergence-aware Sampling** and subjected to rigorous quality control measures, such as filtering truncated outputs and repetitive content. For best results, we recommend a **staged training recipe**: - Train on **Stage 1** first for **stability** - Then continue training on **Stage 2** for **diversity**. (Note: Our [DASD-30B-A3B-Thinking-Preview](https://huggingface.co/Alibaba-Apsara/DASD-30B-A3B-Thinking-Preview) model also achieved strong results using *only* Stage 1 data). As an example, when training Qwen3-4B-Instruct-2507, we suggest following the [configuration for Stage 1](https://github.com/D2I-ai/dasd-thinking/blob/main/train/stage1.yaml) and [configuration for Stage 2](https://github.com/D2I-ai/dasd-thinking/blob/main/train/stage2.yaml). ### Data Sources We use the following publicly available datasets as question sources: - nvidia/AceReason-1.1-SFT - nvidia/OpenCodeReasoning - nvidia/OpenScienceReasoning-2 - a-m-team/AM-DeepSeek-R1-Distilled-1.4M ## 📄 Data Format The DASD-Thinking dataset adheres to a structured JSON format, with each instance containing the following fields: ### Sample: ```json { "uuid": "a7c94c27-1c6a-434a-bec9-dd90c1acac28", "input": "Find the sum of the first six terms of an arithmetic sequence in which the sum of any number of terms is equal to four times the square of that number. Let's think step by step and output the final answer within \boxed{}.", "output": "<think> We need to parse the problem: \"Find the sum of the first six terms of an arithmetic sequence in which the sum of any number of terms is equal to four times the square of that number.\" So we have an arithmetic sequence (i.e., terms are a, a+d, a+2d, ...)...", "domain": "math", "meta": { "training_stage": "stage1", "sampling_temperature": 0.6, "teacher_model": "gpt-oss-120b", "logprob_filename": "stage1/math/train-00073-of-00096.parquet" } } ``` ### Dedicated Fields - **`uuid`** (`str`): A unique identifier for the sample. - **`input`** (`str`): The model input prompt, typically a query or instruction. - **`output`** (`str`): The corresponding teacher-generated response. - **`domain`** (`str`): The task domain (e.g., `"math"`, `"code"`, `"science"`), indicating the nature of the problem. - **`meta`** (`dict`): A metadata dictionary containing auxiliary information used during data generation and training: - **`training_stage`** (`str`): The training stage the sample belongs to (e.g., `stage1` for Stage 1, `stage2` for Stage 2). - **`sampling_temperature`** (`float`): The temperature used by the teacher model when generating the `output`. - **`teacher_model`** (`str`): The name or identifier of the teacher model that produced the response. - **`logprob_filename`** (`str`): The filename containing the corresponding log probabilities in [Superior-Reasoning-SFT-gpt-oss-120b-Logprob](https://huggingface.co/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b-Logprob) Dataset. ## 📊 Proven Effectiveness Models trained on this specific dataset recipe achieve State-of-the-Art performance for their size class. ### 4B Dense Model Performance | Model / Setting | AIME24 | AIME25 | LiveCodeBench v5 | LiveCodeBench v6 | GPQA-D | | --------------------------- | -----: | -----: | -----: | -----: | -----: | | Qwen3-4B-Instruct-2507 | - | 47.4 | - | 35.1 | 62.5 | | + Low-Temperature Training (stage 1) | 84.2 | 74.0 | 56.6 | 50.6 | 67.7 | | + High-Temperature Training (stage 2) | 87.7 | 83.0 | 68.4 | 67.2 | 67.6 | In both Stage 1 and Stage 2, we use an initial learning rate of 5e-5 and a global batch size of 64, training for 6 epochs. Stage 2 continues training from the final checkpoint of Stage 1. Detailed training configurations can be found in our [GitHub repository](https://github.com/D2I-ai/dasd-thinking/blob/main/train) or [technical report](https://arxiv.org/abs/2601.09088). ### 30B MoE Model Performance **[DASD-30B-A3B-Thinking-Preview](https://huggingface.co/datasets/Alibaba-Apsara/DASD-30B-A3B-Thinking-Preview)** (trained on **Stage 1 data only**) demonstrates incredible data efficiency. | Model | AIME25 | LiveCodeBench v6 | GPQA-D | Average | | ------------------------------------ | -----: | -----: | -----: | ------: | | gpt-oss-20b | 91.7 | 61.0 | 71.5 | 74.7 | | Qwen3-30B-A3B-Thinking-2507 | 85.0 | 66.0 | 73.4 | 74.8 | | NVIDIA-Nemotron-3-Nano-30B-A3B | 89.1 | 68.3 | 73.0 | 76.8 | | DASD-30B-A3B-Thinking-Preview (Ours) | 86.7 | 72.8 | 72.3 | 77.3 | DASD-30B-A3B-Thinking-Preview has only undergone the first stage of training, using an initial learning rate of 5e-5 and a global batch size of 64 for 6 epochs. Detailed training configurations can be found in our [GitHub repository](https://github.com/D2I-ai/dasd-thinking/blob/main/train/stage1.yaml) or [technical report](https://arxiv.org/abs/2601.09088). ## 📜 Dataset Access & License The dataset is released under **[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/legalcode)**. ## 📚 Citation DASD-Thinking is developed by Alibaba Cloud, as part of our mission to advance open, efficient, and trustworthy reasoning systems. If you find this work useful in your research or applications, please cite our technical report. ```bibtex @article{yan2026dasd, title={Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning}, author={Yan, Shaotian and Liu, Kaiyuan and Shen, Chen and Wang, Bing and Fan, Sinan and Zhang, Jun and Wu, Yue and Wang, Zheng and Ye, Jieping}, year={2026}, journal={arXiv preprint arXiv:2601.09088}, url={https://arxiv.org/abs/2601.09088} } @article{liu2025where, title={Where Did This Sentence Come From? Tracing Provenance in LLM Reasoning Distillation}, author={Liu, Kaiyuan and Yan, Shaotian and Miao, Rui and Wang, Bing and Shen, Chen and Zhang, Jun and Ye, Jieping}, journal={arXiv preprint arXiv:2512.20908}, year={2025} } ``` We welcome collaboration, feedback, and community contributions to push the boundaries of what small models can reason about—transparently and responsibly.

# Superior-Reasoning-SFT-gpt-oss-120b <img src="assets/dasd-logo.png" alt="Ali" style="vertical-align: middle;"> [![GitHub](https://img.shields.io/badge/GitHub-DASD--Thinking-181717?logo=github&logoColor=white)](https://github.com/D2I-ai/dasd-thinking)  <a href="https://arxiv.org/abs/2601.09088" target="_blank"><img src="https://img.shields.io/badge/技术报告-b5212f.svg?logo=arxiv" height="21px"></a> [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20检查点-DASD--4B--Thinking-yellow)](https://huggingface.co/Alibaba-Apsara/DASD-4B-Thinking)  [![ModelScope 模型](https://img.shields.io/badge/🤖%20检查点-DASD--4B--Thinking-624aff)](https://www.modelscope.cn/models/Alibaba-Apsara/DASD-4B-Thinking)  [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20检查点-DASD--30B--A3B--Thinking--Preview-yellow)](https://huggingface.co/Alibaba-Apsara/DASD-30B-A3B-Thinking-Preview)  [![ModelScope 模型](https://img.shields.io/badge/🤖%20检查点-DASD--30B--A3B--Thinking--Preview-624aff)](https://www.modelscope.cn/models/Alibaba-Apsara/DASD-30B-A3B-Thinking-Preview)  [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20数据集-Superior--Reasoning--SFT--gpt--oss--120b-red)](https://huggingface.co/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b)  [![ModelScope 模型](https://img.shields.io/badge/🤖%20数据集-Superior--Reasoning--SFT--gpt--oss--120b-124aff)](https://www.modelscope.cn/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b)  [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20数据集-Superior--Reasoning--SFT--gpt--oss--120b--Logprob-red)](https://huggingface.co/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b-Logprob)  [![ModelScope 模型](https://img.shields.io/badge/🤖%20数据集-Superior--Reasoning--SFT--gpt--oss--120b--Logprob-124aff)](https://www.modelscope.cn/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b-Logprob)  ## 🚀 项目概览 **Superior-Reasoning-SFT-gpt-oss-120b** 数据集是一套高质量开源合集，包含**435K**条样本，旨在推动高性能长思维链（Long Chain-of-Thought, Long-CoT）模型的训练民主化。与依赖随机采样或启发式过滤的标准蒸馏数据集不同，本数据集采用具备严谨逻辑的**分布对齐序列蒸馏（Distribution-Aligned Sequence Distillation）**流程构建。该流程解决了当前推理蒸馏中的核心局限——即**分布偏移（distributional mismatch）**与**暴露偏差（exposure bias）**，可使小型稠密模型（如**[DASD-4B-Thinking](https://huggingface.co/Alibaba-Apsara/DASD-4B-Thinking)**）在同规模开源模型中达到当前最优（State-of-the-Art, SOTA）性能，且性能显著优于更大尺寸的模型。 <div style="text-align: center;"> <img src="assets/size_4b-performance-img.jpg" alt="基准测试结果" style="width: 90%;"> </div> **Superior-Reasoning-SFT-gpt-oss-120b** 包含DASD-4B-Thinking的Stage 1与Stage 2监督微调（Supervised Fine-Tuning, SFT）训练数据，可用于复现DASD-4B-Thinking的性能。 ### 核心特性 * **极致数据效率**：总样本量仅435K，比同类数据集小一个数量级，却可实现SOTA性能。 * **覆盖多元领域**：涵盖数学、代码生成、科学推理与指令遵循四大领域。 * **教师模型溯源**：所有数据均通过**gpt-oss-120b**的高推理模式蒸馏得到，确保推理轨迹的高质量。 ### 训练技巧如技术报告中详述，我们的训练流程整合了三种互补机制： - **温度调度学习（Temperature-scheduled Learning）**：结合低温轨迹（稳定性）与高温轨迹（多样性与探索性），同时提升学习稳定性与教师模型模式的覆盖范围。 - **感知差异采样（Divergence-aware Sampling）**：基于句级别的师生分布偏移选择序列，并找到更支持有效学习的目标序列级分布。我们还在[Superior-Reasoning-SFT-gpt-oss-120b-Logprob](https://huggingface.co/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b-Logprob)数据集中提供了教师模型的对数概率（logprob）。 - **混合策略蒸馏（Mixed-policy Distillation）**：在离线SFT后引入轻量级在线策略修正协议，使用学生生成的前缀与教师重写内容，提升模型鲁棒性并缓解蒸馏模型的暴露偏差。上述组件共同构成了**连贯的分布感知序列蒸馏流程**，在提升推理迁移保真度的同时保持数据高效性，使4B稠密模型具备强大的长推理能力成为可能。 <div style="text-align: center;"> <img src="assets/pipeline.jpg" alt="DASD-Thinking 训练流程" style="width: 90%;"> </div> **Superior-Reasoning-SFT-gpt-oss-120b** 数据集包含温度调度学习所用的训练样本，分为两个阶段： - 低温训练（Stage 1，约105K条样本） - 高温训练（Stage 2，约330K条样本）。（注：混合策略蒸馏阶段的训练数据由模型在完成Stage 1与Stage 2训练后自行生成，因此未包含在本数据集中。）两个阶段的所有样本均通过**感知差异采样**生成，并经过严格的质量控制流程，例如过滤截断输出与重复内容。为获得最佳效果，我们推荐采用**分阶段训练方案**： - 先在**Stage 1**上进行训练以保证**稳定性** - 再在**Stage 2**上继续训练以获得**多样性**。（注：我们的[DASD-30B-A3B-Thinking-Preview](https://huggingface.co/Alibaba-Apsara/DASD-30B-A3B-Thinking-Preview)模型仅使用Stage 1数据也取得了优异性能。）举个例子，当训练Qwen3-4B-Instruct-2507时，我们建议遵循[Stage 1配置](https://github.com/D2I-ai/dasd-thinking/blob/main/train/stage1.yaml)与[Stage 2配置](https://github.com/D2I-ai/dasd-thinking/blob/main/train/stage2.yaml)。 ### 数据来源我们使用以下公开数据集作为问题来源： - nvidia/AceReason-1.1-SFT - nvidia/OpenCodeReasoning - nvidia/OpenScienceReasoning-2 - a-m-team/AM-DeepSeek-R1-Distilled-1.4M ## 📄 数据格式 DASD-Thinking 数据集采用标准化JSON格式，每个样本包含以下字段： ### 样本示例： json { "uuid": "a7c94c27-1c6a-434a-bec9-dd90c1acac28", "input": "Find the sum of the first six terms of an arithmetic sequence in which the sum of any number of terms is equal to four times the square of that number. Let's think step by step and output the final answer within \boxed{}.", "output": "<think> We need to parse the problem: "Find the sum of the first six terms of an arithmetic sequence in which the sum of any number of terms is equal to four times the square of that number." So we have an arithmetic sequence (i.e., terms are a, a+d, a+2d, ...)...", "domain": "math", "meta": { "training_stage": "stage1", "sampling_temperature": 0.6, "teacher_model": "gpt-oss-120b", "logprob_filename": "stage1/math/train-00073-of-00096.parquet" } } ### 专用字段 - **`uuid`** (`str`)：样本的唯一标识符。 - **`input`** (`str`)：模型输入提示，通常为查询或指令。 - **`output`** (`str`)：对应的教师模型生成的响应。 - **`domain`** (`str`)：任务领域（例如 `"math"`、`"code"`、`"science"`），用于指示问题的类型。 - **`meta`** (`dict`)：包含数据生成与训练过程中所用辅助信息的元数据字典： - **`training_stage`** (`str`)：样本所属的训练阶段（例如 `stage1` 代表第一阶段，`stage2` 代表第二阶段）。 - **`sampling_temperature`** (`float`)：教师模型生成`output`时使用的温度参数。 - **`teacher_model`** (`str`)：生成该响应的教师模型的名称或标识符。 - **`logprob_filename`** (`str`)：[Superior-Reasoning-SFT-gpt-oss-120b-Logprob](https://huggingface.co/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b-Logprob)数据集中包含对应对数概率的文件名。 ## 📊 效果验证使用本数据集训练方案训练的模型在对应尺寸级别中达到了当前最优性能。 ### 4B稠密模型性能 | 模型 / 设置 | AIME24 | AIME25 | LiveCodeBench v5 | LiveCodeBench v6 | GPQA-D | | --------------------------- | -----: | -----: | -----: | -----: | -----: | | Qwen3-4B-Instruct-2507 | - | 47.4 | - | 35.1 | 62.5 | | + 低温训练（Stage 1） | 84.2 | 74.0 | 56.6 | 50.6 | 67.7 | | + 高温训练（Stage 2） | 87.7 | 83.0 | 68.4 | 67.2 | 67.6 | 在Stage 1与Stage 2中，我们均使用初始学习率5e-5，全局批量大小64，训练6个epoch。Stage 2从Stage 1的最终检查点继续训练。详细的训练配置可在我们的[GitHub仓库](https://github.com/D2I-ai/dasd-thinking/blob/main/train)或[技术报告](https://arxiv.org/abs/2601.09088)中找到。 ### 30B混合专家模型性能 **[DASD-30B-A3B-Thinking-Preview](https://huggingface.co/Alibaba-Apsara/DASD-30B-A3B-Thinking-Preview)**（仅使用Stage 1数据训练）展现了出色的数据效率。 | 模型 | AIME25 | LiveCodeBench v6 | GPQA-D | 平均得分 | | ------------------------------------ | -----: | -----: | -----: | ------: | | gpt-oss-20b | 91.7 | 61.0 | 71.5 | 74.7 | | Qwen3-30B-A3B-Thinking-2507 | 85.0 | 66.0 | 73.4 | 74.8 | | NVIDIA-Nemotron-3-Nano-30B-A3B | 89.1 | 68.3 | 73.0 | 76.8 | | DASD-30B-A3B-Thinking-Preview（我们的模型） | 86.7 | 72.8 | 72.3 | 77.3 | DASD-30B-A3B-Thinking-Preview仅完成了第一阶段训练，使用初始学习率5e-5，全局批量大小64，训练6个epoch。详细的训练配置可在我们的[GitHub仓库](https://github.com/D2I-ai/dasd-thinking/blob/main/train/stage1.yaml)或[技术报告](https://arxiv.org/abs/2601.09088)中找到。 ## 📜 数据集获取与许可本数据集采用**[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/legalcode)**协议发布。 ## 📚 引用 DASD-Thinking由阿里云开发，作为我们推进开放、高效、可信推理系统使命的一部分。如果您在研究或应用中使用了本工作，请引用我们的技术报告。 bibtex @article{yan2026dasd, title={Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning}, author={Yan, Shaotian and Liu, Kaiyuan and Shen, Chen and Wang, Bing and Fan, Sinan and Zhang, Jun and Wu, Yue and Wang, Zheng and Ye, Jieping}, year={2026}, journal={arXiv preprint arXiv:2601.09088}, url={https://arxiv.org/abs/2601.09088} } @article{liu2025where, title={Where Did This Sentence Come From? Tracing Provenance in LLM Reasoning Distillation}, author={Liu, Kaiyuan and Yan, Shaotian and Miao, Rui and Wang, Bing and Shen, Chen and Zhang, Jun and Ye, Jieping}, journal={arXiv preprint arXiv:2512.20908}, year={2025} } 我们欢迎合作、反馈与社区贡献，以透明且负责任的方式拓展小型模型的推理能力边界。

提供机构：

maas

创建时间：

2026-01-13

搜集汇总

数据集介绍

背景与挑战

背景概述

Superior-Reasoning-SFT-gpt-oss-120b是一个包含43.5万个样本的高质量开源数据集，专为训练高性能长链思维模型而设计。它采用分布对齐序列蒸馏方法构建，覆盖数学、代码生成、科学推理和指令遵循等多个领域，所有数据均从gpt-oss-120b模型蒸馏而来，旨在提升小型密集模型的推理能力。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集