five

Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b

收藏
Hugging Face2026-01-31 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b
下载链接
链接失效反馈
官方服务:
资源简介:
**Superior-Reasoning-SFT-gpt-oss-120b** 数据集是一个高质量的开源集合,包含 **435K** 个样本,旨在民主化高性能长链思维(Long-CoT)模型的训练。与依赖随机采样或启发式过滤的标准蒸馏数据集不同,该数据集采用了一种原则性的 **分布对齐序列蒸馏** 流程构建。此流程解决了当前推理蒸馏中的关键限制——特别是 **分布不匹配** 和 **暴露偏差**,使得小型密集模型(如 **[DASD-4B-Thinking](https://huggingface.co/Alibaba-Apsara/DASD-4B-Thinking)**)能够在同类规模的开源模型中实现最先进的性能,并显著超越更大的模型。数据集涵盖 **数学**、**代码生成**、**科学推理** 和 **指令遵循** 等多个领域,所有数据均从 **gpt-oss-120b** 的高推理模式中蒸馏而来,确保了高质量的推理轨迹。

The **Superior-Reasoning-SFT-gpt-oss-120b** dataset is a high-quality, open-source collection containing **435K** samples designed to democratize the training of high-performance Long Chain-of-Thought (Long-CoT) models. Unlike standard distilled datasets that rely on random sampling or heuristic filtering, Superior-Reasoning-SFT-gpt-oss-120b is constructed using a principled **Distribution-Aligned Sequence Distillation** pipeline. This pipeline addresses key limitations in current reasoning distillation—specifically **distributional mismatch** and **exposure bias**, enabling small dense models (like **[DASD-4B-Thinking](https://huggingface.co/Alibaba-Apsara/DASD-4B-Thinking)**) to achieve State-of-the-Art performance among open-source models of comparable scale and outperforms significantly larger models. The dataset spans **Mathematics**, **Code Generation**, **Scientific Reasoning**, and **Instruction Following**, with all data distilled from **gpt-oss-120b** using its high-reasoning mode, ensuring high-quality reasoning traces.
提供机构:
Alibaba-Apsara
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作