Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b

Name: Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b
Creator: Alibaba-Apsara
Published: 2026-01-31 10:05:46
License: 暂无描述

Hugging Face2026-01-31 更新2026-02-07 收录

下载链接：

https://hf-mirror.com/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b

下载链接

链接失效反馈

官方服务：

资源简介：

**Superior-Reasoning-SFT-gpt-oss-120b** 数据集是一个高质量的开源集合，包含 **435K** 个样本，旨在民主化高性能长链思维（Long-CoT）模型的训练。与依赖随机采样或启发式过滤的标准蒸馏数据集不同，该数据集采用了一种原则性的 **分布对齐序列蒸馏** 流程构建。此流程解决了当前推理蒸馏中的关键限制——特别是 **分布不匹配** 和 **暴露偏差**，使得小型密集模型（如 **[DASD-4B-Thinking](https://huggingface.co/Alibaba-Apsara/DASD-4B-Thinking)**）能够在同类规模的开源模型中实现最先进的性能，并显著超越更大的模型。数据集涵盖 **数学**、**代码生成**、**科学推理** 和 **指令遵循** 等多个领域，所有数据均从 **gpt-oss-120b** 的高推理模式中蒸馏而来，确保了高质量的推理轨迹。

The **Superior-Reasoning-SFT-gpt-oss-120b** dataset is a high-quality, open-source collection containing **435K** samples designed to democratize the training of high-performance Long Chain-of-Thought (Long-CoT) models. Unlike standard distilled datasets that rely on random sampling or heuristic filtering, Superior-Reasoning-SFT-gpt-oss-120b is constructed using a principled **Distribution-Aligned Sequence Distillation** pipeline. This pipeline addresses key limitations in current reasoning distillation—specifically **distributional mismatch** and **exposure bias**, enabling small dense models (like **[DASD-4B-Thinking](https://huggingface.co/Alibaba-Apsara/DASD-4B-Thinking)**) to achieve State-of-the-Art performance among open-source models of comparable scale and outperforms significantly larger models. The dataset spans **Mathematics**, **Code Generation**, **Scientific Reasoning**, and **Instruction Following**, with all data distilled from **gpt-oss-120b** using its high-reasoning mode, ensuring high-quality reasoning traces.

提供机构：

Alibaba-Apsara

5,000+

优质数据集

54 个

任务类型

进入经典数据集