five

Nemotron-Cascade-SFT-Stage-1

收藏
魔搭社区2026-01-06 更新2025-12-27 收录
下载链接:
https://modelscope.cn/datasets/nv-community/Nemotron-Cascade-SFT-Stage-1
下载链接
链接失效反馈
官方服务:
资源简介:
# Nemotron-Cascade-SFT-Stage-1 Supervised fine-tuning (SFT) for [Nemotron-Cascade](https://huggingface.co/collections/nvidia/nemotron-cascade) is performed in two stages. The [Stage-1 SFT](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-Stage-1) focuses on the math, code, science, and general domains, leveraging a broad and diverse collection of data sources. The [Stage-2 SFT](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-Stage-2) further expands coverage to include math, code, science, tool calling, software engineering (SWE), instruction following, and general domains. In Stage-1, the math domain incorporates questions from [OpenMathReasoning](https://huggingface.co/datasets/nvidia/OpenMathReasoning) and [NuminaMath-CoT](https://huggingface.co/datasets/AI-MO/NuminaMath-CoT). The code domain draws prompts from [OpenCodeReasoning](https://huggingface.co/datasets/nvidia/OpenCodeReasoning), [MagicoderEvolInstruct](https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K), [opc-sft-stage2](https://huggingface.co/datasets/OpenCoder-LLM/opc-sft-stage2), [LeetCode](https://huggingface.co/datasets/ibragim-bad/leetcode_solutions), [TACO](https://huggingface.co/datasets/BAAI/TACO), and [APPS](https://huggingface.co/datasets/codeparrot/apps). The science domain is built using the prompts from [Nemotron-Post-Training-Dataset-v1](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v1) and [S1K](https://huggingface.co/datasets/simplescaling/s1K). The general domain includes questions from [mmlu auxiliary train](https://huggingface.co/datasets/cais/mmlu), [ShareGPT](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered), [SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca), [Magpie-Pro-300K-Filtered-H4](https://huggingface.co/datasets/HuggingFaceTB/Magpie-Pro-300K-Filtered-H4), [UltraInteract](https://huggingface.co/datasets/openbmb/UltraInteract_sft), [GPTeacher](https://huggingface.co/datasets/teknium/GPTeacher-General-Instruct), and [Nemotron-Post-Training-Dataset-v1](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v1). All responses are generated with [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) and include explicit reasoning (thinking) processes. We generate multiple responses for most prompts. ## Data Statistics ### Overall | **Domain** | **#Samples** | | :---: | :---: | | Math | 2,668,741 | | Code | 1,301,591 | | Science | 295,182 | | General | 1,171,104 | | Total | 5,436,618 | ### Math Domain | **Source** | **#Questions** | **#Samples** | | :---: | :---: | :---: | | OpenMathReasoning | 270,534 | 2,147,570 | | NuminaMath-CoT | 78,880 | 521,171 | ### Code Domain | **Source** | **#Questions** | **#Samples** | | :---: | :---: | :---: | | OpenCodeReasoning | 35,374 | 763,495 | | MagicoderEvolInstruct | 27,625 | 27,625 | | opc-sft-stage2 | 79,938 | 323,163 | | leetcode | 5,571 | 126,878 | | TACO | 16,726 | 56,694 | | apps | 159 | 3,736 | ### Science Domain | **Source** | **#Questions** | **#Samples** | | :---: | :---: | :---: | | s1k | 826 | 1,904 | | synthetic | 3,653 | 80,046 | | Nemotron-Post-<br>Training-Dataset-v1 | 62,903 | 213,232 | ### General Domain | **Source** | **#Questions** | **#Samples** | | :---: | :---: | :---: | | mmlu auxiliary train | 94,039 | 103,973 | | SlimOrca | 294,412 | 294,412 | | UltraInteract | 28,916 | 28,916 | | GPTeacher | 17,192 | 17,192 | | ShareGPT_Vicuna_unfiltered | 133,658 | 140,870 | | Magpie-Pro-300K-Filtered-H4 | 269,869 | 532,884 | | Nemotron-Post-<br>Training-Dataset-v1 | 44,073 | 52,857 | ## License This dataset is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0) available at https://creativecommons.org/licenses/by/4.0/legalcode. ## Release Date Dec 15, 2025 ## Citation ``` @article{Nemotron_Cascade_Scaling_Cascaded_Reinforcement_Learning, title={Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models}, author={Wang, Boxin and Lee, Chankyu and Lee, Nayeon and Lin, Sheng-Chieh and Dai, Wenliang and Chen, Yang and Chen, Yangyi and Yang, Zhuolin and Liu, Zihan and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei}, year={2025} } ```

# Nemotron-Cascade-SFT-Stage-1 针对 Nemotron-Cascade 的监督微调(SFT)分为两个阶段进行。其中[第一阶段监督微调(Stage-1 SFT)](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-Stage-1)聚焦数学、代码、科学及通用领域,依托广泛多元的数据源集合构建。[第二阶段监督微调(Stage-2 SFT)](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-Stage-2)则进一步扩展覆盖范围,涵盖数学、代码、科学、工具调用、软件工程(SWE)、指令遵循及通用领域。 在第一阶段中,数学领域的数据源自 [OpenMathReasoning](https://huggingface.co/datasets/nvidia/OpenMathReasoning) 与 [NuminaMath-CoT](https://huggingface.co/datasets/AI-MO/NuminaMath-CoT)。代码领域的提示词取自 [OpenCodeReasoning](https://huggingface.co/datasets/nvidia/OpenCodeReasoning)、[MagicoderEvolInstruct](https://huggingface.co/datasets/ise-uiuc/Magicoder-Evol-Instruct-110K)、[opc-sft-stage2](https://huggingface.co/datasets/OpenCoder-LLM/opc-sft-stage2)、[LeetCode](https://huggingface.co/datasets/ibragim-bad/leetcode_solutions)、[TACO](https://huggingface.co/datasets/BAAI/TACO) 以及 [APPS](https://huggingface.co/datasets/codeparrot/apps)。科学领域的数据基于 [Nemotron-Post-Training-Dataset-v1](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v1) 与 [S1K](https://huggingface.co/datasets/simplescaling/s1K) 的提示词构建。通用领域则包含来自 [mmlu auxiliary train](https://huggingface.co/datasets/cais/mmlu)、[ShareGPT](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered)、[SlimOrca](https://huggingface.co/datasets/Open-Orca/SlimOrca)、[Magpie-Pro-300K-Filtered-H4](https://huggingface.co/datasets/HuggingFaceTB/Magpie-Pro-300K-Filtered-H4)、[UltraInteract](https://huggingface.co/datasets/openbmb/UltraInteract_sft)、[GPTeacher](https://huggingface.co/datasets/teknium/GPTeacher-General-Instruct) 以及 [Nemotron-Post-Training-Dataset-v1](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v1) 的问题。 所有回复均由 [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) 生成,并包含明确的推理(思考)过程。我们为绝大多数提示词生成了多条回复。 ## 数据统计 ### 整体概况 | **领域** | **样本数量** | | :---: | :---: | | 数学 | 2,668,741 | | 代码 | 1,301,591 | | 科学 | 295,182 | | 通用 | 1,171,104 | | 总计 | 5,436,618 | ### 数学领域 | **数据源** | **问题数量** | **样本数量** | | :---: | :---: | :---: | | OpenMathReasoning | 270,534 | 2,147,570 | | NuminaMath-CoT | 78,880 | 521,171 | ### 代码领域 | **数据源** | **问题数量** | **样本数量** | | :---: | :---: | :---: | | OpenCodeReasoning | 35,374 | 763,495 | | MagicoderEvolInstruct | 27,625 | 27,625 | | opc-sft-stage2 | 79,938 | 323,163 | | LeetCode | 5,571 | 126,878 | | TACO | 16,726 | 56,694 | | APPS | 159 | 3,736 | ### 科学领域 | **数据源** | **问题数量** | **样本数量** | | :---: | :---: | :---: | | S1K | 826 | 1,904 | | synthetic | 3,653 | 80,046 | | Nemotron-Post-Training-Dataset-v1 | 62,903 | 213,232 | ### 通用领域 | **数据源** | **问题数量** | **样本数量** | | :---: | :---: | :---: | | mmlu auxiliary train | 94,039 | 103,973 | | SlimOrca | 294,412 | 294,412 | | UltraInteract | 28,916 | 28,916 | | GPTeacher | 17,192 | 17,192 | | ShareGPT_Vicuna_unfiltered | 133,658 | 140,870 | | Magpie-Pro-300K-Filtered-H4 | 269,869 | 532,884 | | Nemotron-Post-Training-Dataset-v1 | 44,073 | 52,857 | ## 授权协议 本数据集采用知识共享署名4.0国际许可协议(CC BY 4.0)进行授权,许可协议详情可参见 https://creativecommons.org/licenses/by/4.0/legalcode。 ## 发布日期 2025年12月15日 ## 引用格式 @article{Nemotron_Cascade_Scaling_Cascaded_Reinforcement_Learning, title={Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models}, author={Wang, Boxin and Lee, Chankyu and Lee, Nayeon and Lin, Sheng-Chieh and Dai, Wenliang and Chen, Yang and Chen, Yangyi and Yang, Zhuolin and Liu, Zihan and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei}, year={2025} }
提供机构:
maas
创建时间:
2025-12-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作