Nemotron-Cascade-SFT-Stage-1
收藏魔搭社区2026-01-06 更新2025-12-27 收录
下载链接:
https://modelscope.cn/datasets/nv-community/Nemotron-Cascade-SFT-Stage-1
下载链接
链接失效反馈官方服务:
资源简介:
# Nemotron-Cascade-SFT-Stage-1
Supervised fine-tuning (SFT) for [Nemotron-Cascade](https://huggingface.co/collections/nvidia/nemotron-cascade) is performed in two stages. The [Stage-1 SFT](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-Stage-1) focuses on the math, code, science, and general domains, leveraging a broad and diverse collection of data sources. The [Stage-2 SFT](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-Stage-2) further expands coverage to include math, code, science, tool calling, software engineering (SWE), instruction following, and general domains.
In Stage-1, the math domain incorporates questions from [OpenMathReasoning](https://huggingface.co/datasets/nvidia/OpenMathReasoning) and [NuminaMath-CoT](https://huggingface.co/datasets/AI-MO/NuminaMath-CoT). The code domain draws prompts from [OpenCodeReasoning](https://huggingface.co/datasets/nvidia/OpenCodeReasoning), [MagicoderEvolInstruct](https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K), [opc-sft-stage2](https://huggingface.co/datasets/OpenCoder-LLM/opc-sft-stage2), [LeetCode](https://huggingface.co/datasets/ibragim-bad/leetcode_solutions), [TACO](https://huggingface.co/datasets/BAAI/TACO), and [APPS](https://huggingface.co/datasets/codeparrot/apps). The science domain is built using the prompts from [Nemotron-Post-Training-Dataset-v1](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v1) and [S1K](https://huggingface.co/datasets/simplescaling/s1K). The general domain includes questions from [mmlu auxiliary train](https://huggingface.co/datasets/cais/mmlu), [ShareGPT](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered), [SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca), [Magpie-Pro-300K-Filtered-H4](https://huggingface.co/datasets/HuggingFaceTB/Magpie-Pro-300K-Filtered-H4), [UltraInteract](https://huggingface.co/datasets/openbmb/UltraInteract_sft), [GPTeacher](https://huggingface.co/datasets/teknium/GPTeacher-General-Instruct), and [Nemotron-Post-Training-Dataset-v1](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v1).
All responses are generated with [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) and include explicit reasoning (thinking) processes. We generate multiple responses for most prompts.
## Data Statistics
### Overall
| **Domain** | **#Samples** |
| :---: | :---: |
| Math | 2,668,741 |
| Code | 1,301,591 |
| Science | 295,182 |
| General | 1,171,104 |
| Total | 5,436,618 |
### Math Domain
| **Source** | **#Questions** | **#Samples** |
| :---: | :---: | :---: |
| OpenMathReasoning | 270,534 | 2,147,570 |
| NuminaMath-CoT | 78,880 | 521,171 |
### Code Domain
| **Source** | **#Questions** | **#Samples** |
| :---: | :---: | :---: |
| OpenCodeReasoning | 35,374 | 763,495 |
| MagicoderEvolInstruct | 27,625 | 27,625 |
| opc-sft-stage2 | 79,938 | 323,163 |
| leetcode | 5,571 | 126,878 |
| TACO | 16,726 | 56,694 |
| apps | 159 | 3,736 |
### Science Domain
| **Source** | **#Questions** | **#Samples** |
| :---: | :---: | :---: |
| s1k | 826 | 1,904 |
| synthetic | 3,653 | 80,046 |
| Nemotron-Post-<br>Training-Dataset-v1 | 62,903 | 213,232 |
### General Domain
| **Source** | **#Questions** | **#Samples** |
| :---: | :---: | :---: |
| mmlu auxiliary train | 94,039 | 103,973 |
| SlimOrca | 294,412 | 294,412 |
| UltraInteract | 28,916 | 28,916 |
| GPTeacher | 17,192 | 17,192 |
| ShareGPT_Vicuna_unfiltered | 133,658 | 140,870 |
| Magpie-Pro-300K-Filtered-H4 | 269,869 | 532,884 |
| Nemotron-Post-<br>Training-Dataset-v1 | 44,073 | 52,857 |
## License
This dataset is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0)
available at https://creativecommons.org/licenses/by/4.0/legalcode.
## Release Date
Dec 15, 2025
## Citation
```
@article{Nemotron_Cascade_Scaling_Cascaded_Reinforcement_Learning,
title={Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models},
author={Wang, Boxin and Lee, Chankyu and Lee, Nayeon and Lin, Sheng-Chieh and Dai, Wenliang and Chen, Yang and Chen, Yangyi and Yang, Zhuolin and Liu, Zihan and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
year={2025}
}
```
# Nemotron-Cascade-SFT-Stage-1
针对 Nemotron-Cascade 的监督微调(SFT)分为两个阶段进行。其中[第一阶段监督微调(Stage-1 SFT)](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-Stage-1)聚焦数学、代码、科学及通用领域,依托广泛多元的数据源集合构建。[第二阶段监督微调(Stage-2 SFT)](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-Stage-2)则进一步扩展覆盖范围,涵盖数学、代码、科学、工具调用、软件工程(SWE)、指令遵循及通用领域。
在第一阶段中,数学领域的数据源自 [OpenMathReasoning](https://huggingface.co/datasets/nvidia/OpenMathReasoning) 与 [NuminaMath-CoT](https://huggingface.co/datasets/AI-MO/NuminaMath-CoT)。代码领域的提示词取自 [OpenCodeReasoning](https://huggingface.co/datasets/nvidia/OpenCodeReasoning)、[MagicoderEvolInstruct](https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K)、[opc-sft-stage2](https://huggingface.co/datasets/OpenCoder-LLM/opc-sft-stage2)、[LeetCode](https://huggingface.co/datasets/ibragim-bad/leetcode_solutions)、[TACO](https://huggingface.co/datasets/BAAI/TACO) 以及 [APPS](https://huggingface.co/datasets/codeparrot/apps)。科学领域的数据基于 [Nemotron-Post-Training-Dataset-v1](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v1) 与 [S1K](https://huggingface.co/datasets/simplescaling/s1K) 的提示词构建。通用领域则包含来自 [mmlu auxiliary train](https://huggingface.co/datasets/cais/mmlu)、[ShareGPT](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered)、[SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca)、[Magpie-Pro-300K-Filtered-H4](https://huggingface.co/datasets/HuggingFaceTB/Magpie-Pro-300K-Filtered-H4)、[UltraInteract](https://huggingface.co/datasets/openbmb/UltraInteract_sft)、[GPTeacher](https://huggingface.co/datasets/teknium/GPTeacher-General-Instruct) 以及 [Nemotron-Post-Training-Dataset-v1](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v1) 的问题。
所有回复均由 [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) 生成,并包含明确的推理(思考)过程。我们为绝大多数提示词生成了多条回复。
## 数据统计
### 整体概况
| **领域** | **样本数量** |
| :---: | :---: |
| 数学 | 2,668,741 |
| 代码 | 1,301,591 |
| 科学 | 295,182 |
| 通用 | 1,171,104 |
| 总计 | 5,436,618 |
### 数学领域
| **数据源** | **问题数量** | **样本数量** |
| :---: | :---: | :---: |
| OpenMathReasoning | 270,534 | 2,147,570 |
| NuminaMath-CoT | 78,880 | 521,171 |
### 代码领域
| **数据源** | **问题数量** | **样本数量** |
| :---: | :---: | :---: |
| OpenCodeReasoning | 35,374 | 763,495 |
| MagicoderEvolInstruct | 27,625 | 27,625 |
| opc-sft-stage2 | 79,938 | 323,163 |
| LeetCode | 5,571 | 126,878 |
| TACO | 16,726 | 56,694 |
| APPS | 159 | 3,736 |
### 科学领域
| **数据源** | **问题数量** | **样本数量** |
| :---: | :---: | :---: |
| S1K | 826 | 1,904 |
| synthetic | 3,653 | 80,046 |
| Nemotron-Post-Training-Dataset-v1 | 62,903 | 213,232 |
### 通用领域
| **数据源** | **问题数量** | **样本数量** |
| :---: | :---: | :---: |
| mmlu auxiliary train | 94,039 | 103,973 |
| SlimOrca | 294,412 | 294,412 |
| UltraInteract | 28,916 | 28,916 |
| GPTeacher | 17,192 | 17,192 |
| ShareGPT_Vicuna_unfiltered | 133,658 | 140,870 |
| Magpie-Pro-300K-Filtered-H4 | 269,869 | 532,884 |
| Nemotron-Post-Training-Dataset-v1 | 44,073 | 52,857 |
## 授权协议
本数据集采用知识共享署名4.0国际许可协议(CC BY 4.0)进行授权,许可协议详情可参见 https://creativecommons.org/licenses/by/4.0/legalcode。
## 发布日期
2025年12月15日
## 引用格式
@article{Nemotron_Cascade_Scaling_Cascaded_Reinforcement_Learning,
title={Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models},
author={Wang, Boxin and Lee, Chankyu and Lee, Nayeon and Lin, Sheng-Chieh and Dai, Wenliang and Chen, Yang and Chen, Yangyi and Yang, Zhuolin and Liu, Zihan and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
year={2025}
}
提供机构:
maas
创建时间:
2025-12-17



