Superior-Reasoning-SFT-gpt-oss-120b-Logprob
收藏魔搭社区2026-04-30 更新2026-05-03 收录
下载链接:
https://modelscope.cn/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b-Logprob
下载链接
链接失效反馈官方服务:
资源简介:
# Superior-Reasoning-SFT-gpt-oss-120b-Logprob
<img src="assets/dasd-logo.png" alt="Ali" style="vertical-align: middle;">
[](https://github.com/D2I-ai/dasd-thinking) 
<a href="https://arxiv.org/abs/2601.09088" target="_blank"><img src="https://img.shields.io/badge/Technical Report-b5212f.svg?logo=arxiv" height="21px"></a>
[](https://huggingface.co/Alibaba-Apsara/DASD-4B-Thinking) 
[](https://www.modelscope.cn/models/Alibaba-Apsara/DASD-4B-Thinking) 
[](https://huggingface.co/Alibaba-Apsara/DASD-30B-A3B-Thinking-Preview) 
[](https://www.modelscope.cn/models/Alibaba-Apsara/DASD-30B-A3B-Thinking-Preview) 
[](https://huggingface.co/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b) 
[](https://www.modelscope.cn/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b) 
[](https://huggingface.co/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b-Logprob) 
[](https://www.modelscope.cn/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b-Logprob) 
## 🚀 Overview
This dataset contains the **token-level log-probabilities** generated by the teacher model (**gpt-oss-120b**) for the reasoning samples in the main **[Superior-Reasoning-SFT-gpt-oss-120b](https://huggingface.co/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b)** Dataset.
## 🔗 Relationship to Main Dataset
This dataset is a **companion** to the main `Superior-Reasoning-SFT-gpt-oss-120b` dataset. Records are linked via a unique `sample_uuid`.
* **Main Dataset**: Contains the text (prompts, responses), domain info, and high-level metadata.
* **Logprobs Dataset (This)**: Contains the token IDs and their corresponding log-probability values from the teacher.
## 📄 Data Format
The data in the DASD-Thinking dataset follows a structured format:
### Example:
```json
{
"sample_uuid": "e6ebfe0b-62a2-4b8b-ac53-4c8d48facbea",
"token_ids": [2167, 1309, ...],
"logprobs": [-0.08339496701955795, -1.2306615114212036, ...],
}
```
## 📊 Proven Effectiveness
Models trained on this specific dataset recipe achieve State-of-the-Art performance for their size class.
### 4B Dense Model Performance
| Model / Setting | AIME24 | AIME25 | LiveCodeBench v5 | LiveCodeBench v6 | GPQA-D |
| --------------------------- | -----: | -----: | -----: | -----: | -----: |
| Qwen3-4B-Instruct-2507 | - | 47.4 | - | 35.1 | 62.5 |
| + Low-Temperature Training (stage 1) | 84.2 | 74.0 | 56.6 | 50.6 | 67.7 |
| + High-Temperature Training (stage 2) | 87.7 | 83.0 | 68.4 | 67.2 | 67.6 |
### 30B MoE Model Performance
**DASD-30B-A3B-Thinking-Preview** (trained on **Stage 1 data only**) demonstrates incredible data efficiency.
| Model | AIME25 | LiveCodeBench v6 | GPQA-D | Average |
| ------------------------------------ | -----: | -----: | -----: | ------: |
| gpt-oss-20b | 91.7 | 61.0 | 71.5 | 74.7 |
| Qwen3-30B-A3B-Thinking-2507 | 85.0 | 66.0 | 73.4 | 74.8 |
| NVIDIA-Nemotron-3-Nano-30B-A3B | 89.1 | 68.3 | 73.0 | 76.8 |
| DASD-30B-A3B-Thinking-Preview (Ours) | 86.7 | 72.8 | 72.3 | 77.3 |
## 📜 Dataset Access & License
The dataset is released under **[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/legalcode)**.
## 📚 Citation
DASD-Thinking is developed by Alibaba Cloud, as part of our mission to advance open, efficient, and trustworthy reasoning systems. If you find this work useful in your research or applications, please cite our technical report.
```bibtex
@article{yan2026dasd,
title={Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning},
author={Yan, Shaotian and Liu, Kaiyuan and Shen, Chen and Wang, Bing and Fan, Sinan and Zhang, Jun and Wu, Yue and Wang, Zheng and Ye, Jieping},
year={2026},
journal={arXiv preprint arXiv:2601.09088},
url={https://arxiv.org/abs/2601.09088}
}
@article{liu2025where,
title={Where Did This Sentence Come From? Tracing Provenance in LLM Reasoning Distillation},
author={Liu, Kaiyuan and Yan, Shaotian and Miao, Rui and Wang, Bing and Shen, Chen and Zhang, Jun and Ye, Jieping},
journal={arXiv preprint arXiv:2512.20908},
year={2025}
}
```
We welcome collaboration, feedback, and community contributions to push the boundaries of what small models can reason about—transparently and responsibly.
# Superior-Reasoning-SFT-gpt-oss-120b-Logprob 数据集
<img src="assets/dasd-logo.png" alt="Ali" style="vertical-align: middle;">
[](https://github.com/D2I-ai/dasd-thinking) 
<a href="https://arxiv.org/abs/2601.09088" target="_blank"><img src="https://img.shields.io/badge/Technical Report-b5212f.svg?logo=arxiv" height="21px"></a>
[](https://huggingface.co/Alibaba-Apsara/DASD-4B-Thinking) 
[](https://www.modelscope.cn/models/Alibaba-Apsara/DASD-4B-Thinking) 
[](https://huggingface.co/Alibaba-Apsara/DASD-30B-A3B-Thinking-Preview) 
[](https://www.modelscope.cn/models/Alibaba-Apsara/DASD-30B-A3B-Thinking-Preview) 
[](https://huggingface.co/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b) 
[](https://www.modelscope.cn/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b) 
[](https://huggingface.co/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b-Logprob) 
[](https://www.modelscope.cn/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b-Logprob) 
## 🚀 数据集概述
本数据集包含教师模型(**gpt-oss-120b**)为核心数据集**[Superior-Reasoning-SFT-gpt-oss-120b](https://huggingface.co/datasets/Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b)** 中的推理样本生成的**Token 级对数概率(token-level log-probabilities)**。
## 🔗 与主数据集的关系
本数据集是主数据集`Superior-Reasoning-SFT-gpt-oss-120b`的配套数据集,所有数据记录通过唯一的`sample_uuid`进行关联。
* **主数据集**:包含文本(提示词、回复)、领域信息与高级元数据。
* **本对数概率数据集**:包含教师模型生成的Token ID及其对应的对数概率值。
## 📄 数据格式
DASD-Thinking 数据集的数据遵循结构化格式:
### 示例:
json
{
"sample_uuid": "e6ebfe0b-62a2-4b8b-ac53-4c8d48facbea",
"token_ids": [2167, 1309, ...],
"logprobs": [-0.08339496701955795, -1.2306615114212036, ...],
}
## 📊 性能验证
基于本数据集训练配方训练的模型,在对应参数量级别下达到了当前最优(State-of-the-Art)性能。
### 4B 密集模型性能
| Model / Setting | AIME24 | AIME25 | LiveCodeBench v5 | LiveCodeBench v6 | GPQA-D |
| --------------------------- | -----: | -----: | -----: | -----: | -----: |
| Qwen3-4B-Instruct-2507 | - | 47.4 | - | 35.1 | 62.5 |
| + 低温度训练(Stage 1) | 84.2 | 74.0 | 56.6 | 50.6 | 67.7 |
| + 高温度训练(Stage 2) | 87.7 | 83.0 | 68.4 | 67.2 | 67.6 |
### 30B MoE 模型性能
**DASD-30B-A3B-Thinking-Preview**(仅使用Stage 1数据训练)展现了出色的数据效率。
| Model | AIME25 | LiveCodeBench v6 | GPQA-D | 平均得分 |
| ------------------------------------ | -----: | -----: | -----: | ------: |
| gpt-oss-20b | 91.7 | 61.0 | 71.5 | 74.7 |
| Qwen3-30B-A3B-Thinking-2507 | 85.0 | 66.0 | 73.4 | 74.8 |
| NVIDIA-Nemotron-3-Nano-30B-A3B | 89.1 | 68.3 | 73.0 | 76.8 |
| DASD-30B-A3B-Thinking-Preview (Ours) | 86.7 | 72.8 | 72.3 | 77.3 |
## 📜 数据集获取与许可
本数据集采用**[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/legalcode)** 协议发布。
## 📚 引用
DASD-Thinking 由阿里云研发,作为我们推进开放、高效且可信的推理系统使命的一部分。若您的研究或应用中用到了本工作,请引用我们的技术报告。
bibtex
@article{yan2026dasd,
title={Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning},
author={Yan, Shaotian and Liu, Kaiyuan and Shen, Chen and Wang, Bing and Fan, Sinan and Zhang, Jun and Wu, Yue and Wang, Zheng and Ye, Jieping},
year={2026},
journal={arXiv preprint arXiv:2601.09088},
url={https://arxiv.org/abs/2601.09088}
}
@article{liu2025where,
title={Where Did This Sentence Come From? Tracing Provenance in LLM Reasoning Distillation},
author={Liu, Kaiyuan and Yan, Shaotian and Miao, Rui and Wang, Bing and Shen, Chen and Zhang, Jun and Ye, Jieping},
journal={arXiv preprint arXiv:2512.20908},
year={2025}
}
我们欢迎合作、反馈与社区贡献,以透明且负责任的方式拓展小模型的推理能力边界。
提供机构:
maas
创建时间:
2026-01-07



