Nemotron-Cascade-SFT-Stage-2
收藏魔搭社区2026-01-06 更新2025-12-27 收录
下载链接:
https://modelscope.cn/datasets/nv-community/Nemotron-Cascade-SFT-Stage-2
下载链接
链接失效反馈官方服务:
资源简介:
# Nemotron-Cascade-SFT-Stage-2
Supervised fine-tuning (SFT) for [Nemotron-Cascade](https://huggingface.co/collections/nvidia/nemotron-cascade) is performed in two stages. The [Stage-1 SFT](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-Stage-1) focuses on the math, code, science, and general domains, leveraging a broad and diverse collection of data sources. The [Stage-2 SFT](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-Stage-2) further expands coverage to include math, code, science, tool calling, software engineering (SWE), instruction following, and general domains.
In Stage-2, the math domain leverages questions from [OpenMathReasoning](https://huggingface.co/datasets/nvidia/OpenMathReasoning). The code domain draws its questions from [OpenCodeReasoning](https://huggingface.co/datasets/nvidia/OpenCodeReasoning), [MagicoderEvolInstruct](https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K), [opc-sft-stage2](https://huggingface.co/datasets/OpenCoder-LLM/opc-sft-stage2), [TACO](https://huggingface.co/datasets/BAAI/TACO), and [APPS](https://huggingface.co/datasets/codeparrot/apps). The science domain is constructed using the data from [Nemotron-Post-Training-Dataset-v1](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v1). The tool calling dataset is also derived from [Nemotron-Post-Training-Dataset-v1](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v1). Prompts for the software engineering (SWE) dataset are collected from [SWE-Bench-Train](https://huggingface.co/datasets/princeton-nlp/SWE-bench), [SWE-reBench](https://huggingface.co/datasets/nebius/SWE-rebench), [SWE-Smith](https://huggingface.co/datasets/SWE-bench/SWE-smith), [R2E-Gym/R2E-Gym-Subset](https://huggingface.co/datasets/R2E-Gym/R2E-Gym-Subset) and [SWE-Fixer-Train](https://huggingface.co/datasets/internlm/SWE-Fixer-Train-110K).
The general domain incorporates questions from [mmlu auxiliary train](https://huggingface.co/datasets/cais/mmlu), [ShareGPT](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered), [SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca), [Magpie-Pro-300K-Filtered-H4](https://huggingface.co/datasets/HuggingFaceTB/Magpie-Pro-300K-Filtered-H4), [UltraInteract](https://huggingface.co/datasets/openbmb/UltraInteract_sft), [GPTeacher](https://huggingface.co/datasets/teknium/GPTeacher-General-Instruct), [Nemotron-Post-Training-Dataset-v1](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v1), [allenai/qasc](https://huggingface.co/datasets/allenai/qasc), and [FLAN V2](https://huggingface.co/datasets/ostapeno/tulu_v2_flan_v2_subset). The instruction following prompts are sourced from [tulu-3-sft-personas-instruction-following](https://huggingface.co/datasets/allenai/tulu-3-sft-personas-instruction-following).
Responses with explicit reasoning (thinking) traces are generated using [DeepSeek-R1-0528](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528), while responses without thinking traces are produced using [DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3) or [DeepSeek-V3-0324](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324). We generate multiple responses for most prompts. In multi-turn conversations, we store only summaries of the assistant’s outputs to reduce token usage. Note that, in the final Stage-2 SFT blend, science, SWE, and general multi-turn conversation data are upsampled 3×.
## Data Statistics
### Overall
| **Domain** | **#Samples** |
| :---: | :---: |
| Math | 1,877,122 |
| Code | 1,392,831 |
| Science | 311,316 |
| General | 3,550,456 |
| Tool Calling | 308,797 |
| Instruction Following | 146,385 |
| SWE Repair | 86,907 |
| SWE Localization | 92,265 |
| SWE TestGen | 31,651 |
| Total | 7,797,730 |
### Math Domain
| **Source** | **#Questions** | **#Samples** |
| :---: | :---: | :---: |
| OpenMathReasoning | 311,505 | 1,877,122 |
### Code Domain
| **Source** | **#Questions** | **#Samples** |
| :---: | :---: | :---: |
| OpenCodeReasoning | 34,747 | 1,143,266 |
| TACO | 5,852 | 58,273 |
| opc-sft-stage2 | 37,820 | 188,063 |
| apps | 159 | 1,575 |
| synthetic | 274 | 1,654 |
### Science Domain
| **Source** | **#Questions** | **#Samples** |
| :---: | :---: | :---: |
| synthetic | 3,694 | 84,389 |
| Nemotron-Post-<br>Training-Dataset-v1 | 188,461 | 226,927 |
### General Domain
| **Source** | **#Questions** | **#Samples** |
| :---: | :---: | :---: |
| mmlu_auxiliary_train | 96,634 | 448,120 |
| HuggingFaceTB/smoltalk | 269,974 | 799,521 |
| SlimOrca | 293,305 | 586,091 |
| synthetic | 688,364 | 1,196,807 |
| ShareGPT_Vicuna_unfiltered | 140,563 | 280,860 |
| nvidia/Nemotron-Post-Training-Dataset-v1 | 45,186 | 95,792 |
| allenai/qasc | 6,932 | 30,628 |
| UltraInteract_sft | 29,041 | 57,936 |
| GPTeacher-General-Instruct | 17,203 | 34,360 |
| flan_v2 | 11,285 | 20,341 |
### Tool Calling
| **Source** | **#Questions** | **#Samples** |
| :---: | :---: | :---: |
| Nemotron-Post-<br>Training-Dataset-v1 | 308,797 | 308,797 |
### Instruction Following
| **Source** | **#Questions** | **#Samples** |
| :---: | :---: | :---: |
| allenai/tulu-3-sft-personas-instruction-following | 29,918 | 146,385 |
### Software Engineering (SWE)
- SWE Repair
| **Source** | **#Questions** | **#Samples** |
| :---: | :---: | :---: |
| princeton-nlp/SWE-bench | 3,676 | 19,540 |
| nebius/SWE-rebench | 2,822 | 16,441 |
| SWE-bench/SWE-smith | 2,822 | 17,625 |
| internlm/SWE-Fixer-Train-110K | 22,367 | 33,301 |
- SWE Localization
| **Source** | **#Questions** | **#Samples** |
| :---: | :---: | :---: |
| internlm/SWE-Fixer-Train-110K | 52,703 | 53,230 |
| SWE-bench/SWE-smith | 9,707 | 9,714 |
| princeton-nlp/SWE-bench | 10,251 | 16,184 |
| nebius/SWE-rebench | 9,654 | 9,693 |
| R2E-Gym/R2E-Gym-Subset | 3,444 | 3,444 |
- SWE TestGen
| **Source** | **#Questions** | **#Samples** |
| :---: | :---: | :---: |
| SWE-bench/SWE-smith | 2,749 | 2,881 |
| princeton-nlp/SWE-bench | 4,817 | 6,066 |
| nebius/SWE-rebench | 3,021 | 3,026 |
| internlm/SWE-Fixer-Train-110K | 18,792 | 19,678 |
## License
This dataset is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0)
available at https://creativecommons.org/licenses/by/4.0/legalcode.
## Release Date
Dec 15, 2025
## Citation
```
@article{Nemotron_Cascade_Scaling_Cascaded_Reinforcement_Learning,
title={Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models},
author={Wang, Boxin and Lee, Chankyu and Lee, Nayeon and Lin, Sheng-Chieh and Dai, Wenliang and Chen, Yang and Chen, Yangyi and Yang, Zhuolin and Liu, Zihan and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
year={2025}
}
```
# Nemotron-Cascade-SFT-Stage-2
Nemotron-Cascade的监督微调(Supervised Fine-Tuning,SFT)分为两个阶段完成。[第一阶段SFT(Stage-1 SFT)](https://huggingface.co/collections/nvidia/nemotron-cascade) 聚焦数学、代码、科学与通用领域,依托广泛多元的数据源集合构建。[第二阶段SFT(Stage-2 SFT)](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-Stage-2) 进一步拓展覆盖范围,涵盖数学、代码、科学、工具调用、软件工程(Software Engineering,SWE)、指令遵循及通用领域。
在第二阶段中,数学领域的数据源自[OpenMathReasoning](https://huggingface.co/datasets/nvidia/OpenMathReasoning)。代码领域的问题则取自[OpenCodeReasoning](https://huggingface.co/datasets/nvidia/OpenCodeReasoning)、[MagicoderEvolInstruct](https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K)、[opc-sft-stage2](https://huggingface.co/datasets/OpenCoder-LLM/opc-sft-stage2)、[TACO](https://huggingface.co/datasets/BAAI/TACO)与[APPS](https://huggingface.co/datasets/codeparrot/apps)。科学领域的数据基于[Nemotron-Post-Training-Dataset-v1](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v1)构建;工具调用数据集同样源自该数据集。软件工程(SWE)数据集的提示词采集自[SWE-Bench-Train](https://huggingface.co/datasets/princeton-nlp/SWE-bench)、[SWE-reBench](https://huggingface.co/datasets/nebius/SWE-rebench)、[SWE-Smith](https://huggingface.co/datasets/SWE-bench/SWE-smith)、[R2E-Gym/R2E-Gym-Subset](https://huggingface.co/datasets/R2E-Gym/R2E-Gym-Subset)以及[SWE-Fixer-Train-110K](https://huggingface.co/datasets/internlm/SWE-Fixer-Train-110K)。
通用领域整合了来自[mmlu auxiliary train](https://huggingface.co/datasets/cais/mmlu)、[ShareGPT](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered)、[SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca)、[Magpie-Pro-300K-Filtered-H4](https://huggingface.co/datasets/HuggingFaceTB/Magpie-Pro-300K-Filtered-H4)、[UltraInteract](https://huggingface.co/datasets/openbmb/UltraInteract_sft)、[GPTeacher](https://huggingface.co/datasets/teknium/GPTeacher-General-Instruct)、[Nemotron-Post-Training-Dataset-v1](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v1)、[allenai/qasc](https://huggingface.co/datasets/allenai/qasc)与[FLAN V2](https://huggingface.co/datasets/ostapeno/tulu_v2_flan_v2_subset)的问题。指令遵循的提示词来源于[tulu-3-sft-personas-instruction-following](https://huggingface.co/datasets/allenai/tulu-3-sft-personas-instruction-following)。
带有显式推理(思考)轨迹的回复由[DeepSeek-R1-0528](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528)生成,无思考轨迹的回复则由[DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3)或[DeepSeek-V3-0324](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324)生成。我们为多数提示词生成多条回复。在多轮对话中,我们仅存储助手输出的摘要以减少Token(Token)使用量。请注意,在最终的第二阶段SFT混合数据集中,科学、SWE及通用多轮对话数据会被上采样3倍。
## 数据统计
### 整体统计
| **领域** | **样本数量** |
| :---: | :---: |
| 数学 | 1,877,122 |
| 代码 | 1,392,831 |
| 科学 | 311,316 |
| 通用 | 3,550,456 |
| 工具调用 | 308,797 |
| 指令遵循 | 146,385 |
| SWE修复 | 86,907 |
| SWE定位 | 92,265 |
| SWE测试生成 | 31,651 |
| 总计 | 7,797,730 |
### 数学领域
| **数据源** | **问题数量** | **样本数量** |
| :---: | :---: | :---: |
| OpenMathReasoning | 311,505 | 1,877,122 |
### 代码领域
| **数据源** | **问题数量** | **样本数量** |
| :---: | :---: | :---: |
| OpenCodeReasoning | 34,747 | 1,143,266 |
| TACO | 5,852 | 58,273 |
| opc-sft-stage2 | 37,820 | 188,063 |
| apps | 159 | 1,575 |
| synthetic | 274 | 1,654 |
### 科学领域
| **数据源** | **问题数量** | **样本数量** |
| :---: | :---: | :---: |
| synthetic | 3,694 | 84,389 |
| Nemotron-Post-Training-Dataset-v1 | 188,461 | 226,927 |
### 通用领域
| **数据源** | **问题数量** | **样本数量** |
| :---: | :---: | :---: |
| mmlu_auxiliary_train | 96,634 | 448,120 |
| HuggingFaceTB/smoltalk | 269,974 | 799,521 |
| SlimOrca | 293,305 | 586,091 |
| synthetic | 688,364 | 1,196,807 |
| ShareGPT_Vicuna_unfiltered | 140,563 | 280,860 |
| nvidia/Nemotron-Post-Training-Dataset-v1 | 45,186 | 95,792 |
| allenai/qasc | 6,932 | 30,628 |
| UltraInteract_sft | 29,041 | 57,936 |
| GPTeacher-General-Instruct | 17,203 | 34,360 |
| flan_v2 | 11,285 | 20,341 |
### 工具调用
| **数据源** | **问题数量** | **样本数量** |
| :---: | :---: | :---: |
| Nemotron-Post-Training-Dataset-v1 | 308,797 | 308,797 |
### 指令遵循
| **数据源** | **问题数量** | **样本数量** |
| :---: | :---: | :---: |
| allenai/tulu-3-sft-personas-instruction-following | 29,918 | 146,385 |
### 软件工程(SWE)
#### SWE修复
| **数据源** | **问题数量** | **样本数量** |
| :---: | :---: | :---: |
| princeton-nlp/SWE-bench | 3,676 | 19,540 |
| nebius/SWE-rebench | 2,822 | 16,441 |
| SWE-bench/SWE-smith | 2,822 | 17,625 |
| internlm/SWE-Fixer-Train-110K | 22,367 | 33,301 |
#### SWE定位
| **数据源** | **问题数量** | **样本数量** |
| :---: | :---: | :---: |
| internlm/SWE-Fixer-Train-110K | 52,703 | 53,230 |
| SWE-bench/SWE-smith | 9,707 | 9,714 |
| princeton-nlp/SWE-bench | 10,251 | 16,184 |
| nebius/SWE-rebench | 9,654 | 9,693 |
| R2E-Gym/R2E-Gym-Subset | 3,444 | 3,444 |
#### SWE测试生成
| **数据源** | **问题数量** | **样本数量** |
| :---: | :---: | :---: |
| SWE-bench/SWE-smith | 2,749 | 2,881 |
| princeton-nlp/SWE-bench | 4,817 | 6,066 |
| nebius/SWE-rebench | 3,021 | 3,026 |
| internlm/SWE-Fixer-Train-110K | 18,792 | 19,678 |
## 许可证
本数据集采用知识共享署名4.0国际许可协议(Creative Commons Attribution 4.0 International License,CC BY 4.0)进行授权,许可详情见 https://creativecommons.org/licenses/by/4.0/legalcode。
## 发布日期
2025年12月15日
## 引用
@article{Nemotron_Cascade_Scaling_Cascaded_Reinforcement_Learning,
title={Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models},
author={Wang, Boxin and Lee, Chankyu and Lee, Nayeon and Lin, Sheng-Chieh and Dai, Wenliang and Chen, Yang and Chen, Yangyi and Yang, Zhuolin and Liu, Zihan and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
year={2025}
}
提供机构:
maas
创建时间:
2025-12-17
搜集汇总
数据集介绍

背景与挑战
背景概述
Nemotron-Cascade-SFT-Stage-2是Nemotron-Cascade监督微调的第二阶段数据集,覆盖数学、代码、科学、工具调用、软件工程、指令遵循和通用领域,数据来源于多个特定数据集如OpenMathReasoning和OpenCodeReasoning。该数据集包含约780万样本,采用CC BY 4.0许可,并利用DeepSeek模型生成响应以增强推理能力。
以上内容由遇见数据集搜集并总结生成



