five

Nemotron-Cascade-SFT-Stage-2

收藏
魔搭社区2026-01-06 更新2025-12-27 收录
下载链接:
https://modelscope.cn/datasets/nv-community/Nemotron-Cascade-SFT-Stage-2
下载链接
链接失效反馈
官方服务:
资源简介:
# Nemotron-Cascade-SFT-Stage-2 Supervised fine-tuning (SFT) for [Nemotron-Cascade](https://huggingface.co/collections/nvidia/nemotron-cascade) is performed in two stages. The [Stage-1 SFT](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-Stage-1) focuses on the math, code, science, and general domains, leveraging a broad and diverse collection of data sources. The [Stage-2 SFT](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-Stage-2) further expands coverage to include math, code, science, tool calling, software engineering (SWE), instruction following, and general domains. In Stage-2, the math domain leverages questions from [OpenMathReasoning](https://huggingface.co/datasets/nvidia/OpenMathReasoning). The code domain draws its questions from [OpenCodeReasoning](https://huggingface.co/datasets/nvidia/OpenCodeReasoning), [MagicoderEvolInstruct](https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K), [opc-sft-stage2](https://huggingface.co/datasets/OpenCoder-LLM/opc-sft-stage2), [TACO](https://huggingface.co/datasets/BAAI/TACO), and [APPS](https://huggingface.co/datasets/codeparrot/apps). The science domain is constructed using the data from [Nemotron-Post-Training-Dataset-v1](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v1). The tool calling dataset is also derived from [Nemotron-Post-Training-Dataset-v1](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v1). Prompts for the software engineering (SWE) dataset are collected from [SWE-Bench-Train](https://huggingface.co/datasets/princeton-nlp/SWE-bench), [SWE-reBench](https://huggingface.co/datasets/nebius/SWE-rebench), [SWE-Smith](https://huggingface.co/datasets/SWE-bench/SWE-smith), [R2E-Gym/R2E-Gym-Subset](https://huggingface.co/datasets/R2E-Gym/R2E-Gym-Subset) and [SWE-Fixer-Train](https://huggingface.co/datasets/internlm/SWE-Fixer-Train-110K). The general domain incorporates questions from [mmlu auxiliary train](https://huggingface.co/datasets/cais/mmlu), [ShareGPT](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered), [SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca), [Magpie-Pro-300K-Filtered-H4](https://huggingface.co/datasets/HuggingFaceTB/Magpie-Pro-300K-Filtered-H4), [UltraInteract](https://huggingface.co/datasets/openbmb/UltraInteract_sft), [GPTeacher](https://huggingface.co/datasets/teknium/GPTeacher-General-Instruct), [Nemotron-Post-Training-Dataset-v1](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v1), [allenai/qasc](https://huggingface.co/datasets/allenai/qasc), and [FLAN V2](https://huggingface.co/datasets/ostapeno/tulu_v2_flan_v2_subset). The instruction following prompts are sourced from [tulu-3-sft-personas-instruction-following](https://huggingface.co/datasets/allenai/tulu-3-sft-personas-instruction-following). Responses with explicit reasoning (thinking) traces are generated using [DeepSeek-R1-0528](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528), while responses without thinking traces are produced using [DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3) or [DeepSeek-V3-0324](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324). We generate multiple responses for most prompts. In multi-turn conversations, we store only summaries of the assistant’s outputs to reduce token usage. Note that, in the final Stage-2 SFT blend, science, SWE, and general multi-turn conversation data are upsampled 3×. ## Data Statistics ### Overall | **Domain** | **#Samples** | | :---: | :---: | | Math | 1,877,122 | | Code | 1,392,831 | | Science | 311,316 | | General | 3,550,456 | | Tool Calling | 308,797 | | Instruction Following | 146,385 | | SWE Repair | 86,907 | | SWE Localization | 92,265 | | SWE TestGen | 31,651 | | Total | 7,797,730 | ### Math Domain | **Source** | **#Questions** | **#Samples** | | :---: | :---: | :---: | | OpenMathReasoning | 311,505 | 1,877,122 | ### Code Domain | **Source** | **#Questions** | **#Samples** | | :---: | :---: | :---: | | OpenCodeReasoning | 34,747 | 1,143,266 | | TACO | 5,852 | 58,273 | | opc-sft-stage2 | 37,820 | 188,063 | | apps | 159 | 1,575 | | synthetic | 274 | 1,654 | ### Science Domain | **Source** | **#Questions** | **#Samples** | | :---: | :---: | :---: | | synthetic | 3,694 | 84,389 | | Nemotron-Post-<br>Training-Dataset-v1 | 188,461 | 226,927 | ### General Domain | **Source** | **#Questions** | **#Samples** | | :---: | :---: | :---: | | mmlu_auxiliary_train | 96,634 | 448,120 | | HuggingFaceTB/smoltalk | 269,974 | 799,521 | | SlimOrca | 293,305 | 586,091 | | synthetic | 688,364 | 1,196,807 | | ShareGPT_Vicuna_unfiltered | 140,563 | 280,860 | | nvidia/Nemotron-Post-Training-Dataset-v1 | 45,186 | 95,792 | | allenai/qasc | 6,932 | 30,628 | | UltraInteract_sft | 29,041 | 57,936 | | GPTeacher-General-Instruct | 17,203 | 34,360 | | flan_v2 | 11,285 | 20,341 | ### Tool Calling | **Source** | **#Questions** | **#Samples** | | :---: | :---: | :---: | | Nemotron-Post-<br>Training-Dataset-v1 | 308,797 | 308,797 | ### Instruction Following | **Source** | **#Questions** | **#Samples** | | :---: | :---: | :---: | | allenai/tulu-3-sft-personas-instruction-following | 29,918 | 146,385 | ### Software Engineering (SWE) - SWE Repair | **Source** | **#Questions** | **#Samples** | | :---: | :---: | :---: | | princeton-nlp/SWE-bench | 3,676 | 19,540 | | nebius/SWE-rebench | 2,822 | 16,441 | | SWE-bench/SWE-smith | 2,822 | 17,625 | | internlm/SWE-Fixer-Train-110K | 22,367 | 33,301 | - SWE Localization | **Source** | **#Questions** | **#Samples** | | :---: | :---: | :---: | | internlm/SWE-Fixer-Train-110K | 52,703 | 53,230 | | SWE-bench/SWE-smith | 9,707 | 9,714 | | princeton-nlp/SWE-bench | 10,251 | 16,184 | | nebius/SWE-rebench | 9,654 | 9,693 | | R2E-Gym/R2E-Gym-Subset | 3,444 | 3,444 | - SWE TestGen | **Source** | **#Questions** | **#Samples** | | :---: | :---: | :---: | | SWE-bench/SWE-smith | 2,749 | 2,881 | | princeton-nlp/SWE-bench | 4,817 | 6,066 | | nebius/SWE-rebench | 3,021 | 3,026 | | internlm/SWE-Fixer-Train-110K | 18,792 | 19,678 | ## License This dataset is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0) available at https://creativecommons.org/licenses/by/4.0/legalcode. ## Release Date Dec 15, 2025 ## Citation ``` @article{Nemotron_Cascade_Scaling_Cascaded_Reinforcement_Learning, title={Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models}, author={Wang, Boxin and Lee, Chankyu and Lee, Nayeon and Lin, Sheng-Chieh and Dai, Wenliang and Chen, Yang and Chen, Yangyi and Yang, Zhuolin and Liu, Zihan and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei}, year={2025} } ```

# Nemotron-Cascade-SFT-Stage-2 Nemotron-Cascade的监督微调(Supervised Fine-Tuning,SFT)分为两个阶段完成。[第一阶段SFT(Stage-1 SFT)](https://huggingface.co/collections/nvidia/nemotron-cascade) 聚焦数学、代码、科学与通用领域,依托广泛多元的数据源集合构建。[第二阶段SFT(Stage-2 SFT)](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-Stage-2) 进一步拓展覆盖范围,涵盖数学、代码、科学、工具调用、软件工程(Software Engineering,SWE)、指令遵循及通用领域。 在第二阶段中,数学领域的数据源自[OpenMathReasoning](https://huggingface.co/datasets/nvidia/OpenMathReasoning)。代码领域的问题则取自[OpenCodeReasoning](https://huggingface.co/datasets/nvidia/OpenCodeReasoning)、[MagicoderEvolInstruct](https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K)、[opc-sft-stage2](https://huggingface.co/datasets/OpenCoder-LLM/opc-sft-stage2)、[TACO](https://huggingface.co/datasets/BAAI/TACO)与[APPS](https://huggingface.co/datasets/codeparrot/apps)。科学领域的数据基于[Nemotron-Post-Training-Dataset-v1](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v1)构建;工具调用数据集同样源自该数据集。软件工程(SWE)数据集的提示词采集自[SWE-Bench-Train](https://huggingface.co/datasets/princeton-nlp/SWE-bench)、[SWE-reBench](https://huggingface.co/datasets/nebius/SWE-rebench)、[SWE-Smith](https://huggingface.co/datasets/SWE-bench/SWE-smith)、[R2E-Gym/R2E-Gym-Subset](https://huggingface.co/datasets/R2E-Gym/R2E-Gym-Subset)以及[SWE-Fixer-Train-110K](https://huggingface.co/datasets/internlm/SWE-Fixer-Train-110K)。 通用领域整合了来自[mmlu auxiliary train](https://huggingface.co/datasets/cais/mmlu)、[ShareGPT](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered)、[SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca)、[Magpie-Pro-300K-Filtered-H4](https://huggingface.co/datasets/HuggingFaceTB/Magpie-Pro-300K-Filtered-H4)、[UltraInteract](https://huggingface.co/datasets/openbmb/UltraInteract_sft)、[GPTeacher](https://huggingface.co/datasets/teknium/GPTeacher-General-Instruct)、[Nemotron-Post-Training-Dataset-v1](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v1)、[allenai/qasc](https://huggingface.co/datasets/allenai/qasc)与[FLAN V2](https://huggingface.co/datasets/ostapeno/tulu_v2_flan_v2_subset)的问题。指令遵循的提示词来源于[tulu-3-sft-personas-instruction-following](https://huggingface.co/datasets/allenai/tulu-3-sft-personas-instruction-following)。 带有显式推理(思考)轨迹的回复由[DeepSeek-R1-0528](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528)生成,无思考轨迹的回复则由[DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3)或[DeepSeek-V3-0324](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324)生成。我们为多数提示词生成多条回复。在多轮对话中,我们仅存储助手输出的摘要以减少Token(Token)使用量。请注意,在最终的第二阶段SFT混合数据集中,科学、SWE及通用多轮对话数据会被上采样3倍。 ## 数据统计 ### 整体统计 | **领域** | **样本数量** | | :---: | :---: | | 数学 | 1,877,122 | | 代码 | 1,392,831 | | 科学 | 311,316 | | 通用 | 3,550,456 | | 工具调用 | 308,797 | | 指令遵循 | 146,385 | | SWE修复 | 86,907 | | SWE定位 | 92,265 | | SWE测试生成 | 31,651 | | 总计 | 7,797,730 | ### 数学领域 | **数据源** | **问题数量** | **样本数量** | | :---: | :---: | :---: | | OpenMathReasoning | 311,505 | 1,877,122 | ### 代码领域 | **数据源** | **问题数量** | **样本数量** | | :---: | :---: | :---: | | OpenCodeReasoning | 34,747 | 1,143,266 | | TACO | 5,852 | 58,273 | | opc-sft-stage2 | 37,820 | 188,063 | | apps | 159 | 1,575 | | synthetic | 274 | 1,654 | ### 科学领域 | **数据源** | **问题数量** | **样本数量** | | :---: | :---: | :---: | | synthetic | 3,694 | 84,389 | | Nemotron-Post-Training-Dataset-v1 | 188,461 | 226,927 | ### 通用领域 | **数据源** | **问题数量** | **样本数量** | | :---: | :---: | :---: | | mmlu_auxiliary_train | 96,634 | 448,120 | | HuggingFaceTB/smoltalk | 269,974 | 799,521 | | SlimOrca | 293,305 | 586,091 | | synthetic | 688,364 | 1,196,807 | | ShareGPT_Vicuna_unfiltered | 140,563 | 280,860 | | nvidia/Nemotron-Post-Training-Dataset-v1 | 45,186 | 95,792 | | allenai/qasc | 6,932 | 30,628 | | UltraInteract_sft | 29,041 | 57,936 | | GPTeacher-General-Instruct | 17,203 | 34,360 | | flan_v2 | 11,285 | 20,341 | ### 工具调用 | **数据源** | **问题数量** | **样本数量** | | :---: | :---: | :---: | | Nemotron-Post-Training-Dataset-v1 | 308,797 | 308,797 | ### 指令遵循 | **数据源** | **问题数量** | **样本数量** | | :---: | :---: | :---: | | allenai/tulu-3-sft-personas-instruction-following | 29,918 | 146,385 | ### 软件工程(SWE) #### SWE修复 | **数据源** | **问题数量** | **样本数量** | | :---: | :---: | :---: | | princeton-nlp/SWE-bench | 3,676 | 19,540 | | nebius/SWE-rebench | 2,822 | 16,441 | | SWE-bench/SWE-smith | 2,822 | 17,625 | | internlm/SWE-Fixer-Train-110K | 22,367 | 33,301 | #### SWE定位 | **数据源** | **问题数量** | **样本数量** | | :---: | :---: | :---: | | internlm/SWE-Fixer-Train-110K | 52,703 | 53,230 | | SWE-bench/SWE-smith | 9,707 | 9,714 | | princeton-nlp/SWE-bench | 10,251 | 16,184 | | nebius/SWE-rebench | 9,654 | 9,693 | | R2E-Gym/R2E-Gym-Subset | 3,444 | 3,444 | #### SWE测试生成 | **数据源** | **问题数量** | **样本数量** | | :---: | :---: | :---: | | SWE-bench/SWE-smith | 2,749 | 2,881 | | princeton-nlp/SWE-bench | 4,817 | 6,066 | | nebius/SWE-rebench | 3,021 | 3,026 | | internlm/SWE-Fixer-Train-110K | 18,792 | 19,678 | ## 许可证 本数据集采用知识共享署名4.0国际许可协议(Creative Commons Attribution 4.0 International License,CC BY 4.0)进行授权,许可详情见 https://creativecommons.org/licenses/by/4.0/legalcode。 ## 发布日期 2025年12月15日 ## 引用 @article{Nemotron_Cascade_Scaling_Cascaded_Reinforcement_Learning, title={Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models}, author={Wang, Boxin and Lee, Chankyu and Lee, Nayeon and Lin, Sheng-Chieh and Dai, Wenliang and Chen, Yang and Chen, Yangyi and Yang, Zhuolin and Liu, Zihan and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei}, year={2025} }
提供机构:
maas
创建时间:
2025-12-17
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
Nemotron-Cascade-SFT-Stage-2是Nemotron-Cascade监督微调的第二阶段数据集,覆盖数学、代码、科学、工具调用、软件工程、指令遵循和通用领域,数据来源于多个特定数据集如OpenMathReasoning和OpenCodeReasoning。该数据集包含约780万样本,采用CC BY 4.0许可,并利用DeepSeek模型生成响应以增强推理能力。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作