Nemotron-Cascade-2-SFT-Data

Name: Nemotron-Cascade-2-SFT-Data
Creator: maas
Published: 2026-05-22 15:22:01
License: 暂无描述

魔搭社区2026-05-22 更新2026-05-10 收录

下载链接：

https://modelscope.cn/datasets/nv-community/Nemotron-Cascade-2-SFT-Data

下载链接

链接失效反馈

官方服务：

资源简介：

# Nemotron-Cascade-2-SFT-Data We release the SFT data used for training [Nemotron-Cascade-2](https://huggingface.co/nvidia/Nemotron-Cascade-2-30B-A3B). ## Data sources #### Math Our non-proof math prompts are sourced from [Nemotron-Cascade-1-SFT](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-Stage-2) and [Nemotron-Math-v2](https://huggingface.co/datasets/nvidia/Nemotron-Math-v2), with responses generated by DeepSeek-V3.2, DeepSeek-V3.2-Speciale, and GPT-OSS-120B. For mathematical proofs, prompts are taken from [Nemotron-Math-Proofs-v1](https://huggingface.co/datasets/nvidia/Nemotron-Math-Proofs-v1) and generated using DeepSeek-V3.2-Speciale. #### Science We collect science prompts from [Nemotron-Cascade-1-SFT](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-Stage-2) and [Nemotron-Science-v1](https://huggingface.co/datasets/nvidia/Nemotron-Science-v1), coving physics, chemistry, and biology. Responses are generated by GPT-OSS-120B. #### General Chat We source general chat samples from [Nemotron-Cascade-1-SFT](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-Stage-2) and [Nemotron-Instruction-Following-Chat-v1](https://huggingface.co/datasets/nvidia/Nemotron-Instruction-Following-Chat-v1). #### Instruction Following The samples are sourced from [Nemotron-Cascade-1-SFT](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-Stage-2) and [Nemotron-Instruction-Following-Chat-v1](https://huggingface.co/datasets/nvidia/Nemotron-Instruction-Following-Chat-v1). #### Safety The samples are sourced from [Nemotron-SFT-Safety-v1](https://huggingface.co/datasets/nvidia/Nemotron-SFT-Safety-v1). #### Conversational Agent The prompts are sourced from [Nemotron-Agentic-v1](https://huggingface.co/datasets/nvidia/Nemotron-Agentic-v1) and [Nemotron-RL-Agentic-Conversational-Tool-Use-Pivot-v1](https://huggingface.co/datasets/nvidia/Nemotron-RL-Agentic-Conversational-Tool-Use-Pivot-v1), with responses generated by Qwen3-235B-A22B-Thinking-2507, Qwen3-32B, Qwen3-235B-A22B-Instruct-2507, and GPT-OSS-120B. #### Software Engineering Agent We collect agentless samples from [Nemotron-Cascade-1-SFT-SWE](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-1-SFT-SWE), covering buggy code localization, code repair, and test case generation. Agentic samples are drawn from [SWE-Gym](https://huggingface.co/datasets/SWE-Gym/SWE-Gym), [SWE-rebench](https://huggingface.co/datasets/nebius/SWE-rebench), and [R2E-Gym-Subset](https://huggingface.co/datasets/R2E-Gym/R2E-Gym-Subset). #### Terminal Agent The samples are sourced from [Nemotron-Terminal-Corpus](https://huggingface.co/datasets/nvidia/Nemotron-Terminal-Corpus). ## Training We pack all SFT samples into sequences of up to 256K tokens and train the model in a single stage. Empirically, we find that the SFT model reaches optimal performance after approximately 1.5 epochs. | Hyperparameters | | | :--- | :---: | | Global Batch Size | 64 | | Packed Sequence Length | 256K | | Max Learning Rate | 5e-5 | | Min Learning Rate | 5e-6 | | Learning Rate Warmup Steps | 200 | | Scheduler | cosine | | Max Steps | 40,000 | | Optimizer | AdamW | | Optimizer Config | beta_1=0.9<br>beta_2=0.98 | | Weight Decay | 0.1 | | # of training steps | 33,000 | ## Statistics | Domain | # Samples | | :--- | :---: | | Math | 5,226,364 | | Science | 2,717,163 | | General Chat | 13,972,873 | | Instruction Following | 820,263 | | Safety | 3,570 | | Conversational Agent | 822,213 | | Software Engineering Agent | 439,610 | | Terminal Agent | 822,213 | ## Release Date Mar 19, 2026 ## License Your use of this model is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). ## Citation ``` @article{Nemotron_Cascade_2, title={Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation}, author={Yang, Zhuolin and Liu, Zihan and Chen, Yang and Dai, Wenliang and Wang, Boxin and Lin, Sheng-Chieh and Lee, Chankyu and Chen, Yangyi and Jiang, Dongfu and He, Jiafan and Pi, Renjie and Lam, Grace and Lee, Nayeon and Bukharin, Alexander and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei}, year={2026} } ```

# Nemotron-Cascade-2-SFT数据集本数据集发布了用于训练[Nemotron-Cascade-2](https://huggingface.co/nvidia/Nemotron-Cascade-2-30B-A3B)的监督微调（Supervised Fine-Tuning，SFT）数据。 ## 数据来源 #### 数学领域本数据集的非证明类数学提示词来源于[Nemotron-Cascade-1-SFT](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-Stage-2)与[Nemotron-Math-v2](https://huggingface.co/datasets/nvidia/Nemotron-Math-v2)，对应的回答由DeepSeek-V3.2、DeepSeek-V3.2-Speciale以及GPT-OSS-120B生成。而数学证明类提示词则取自[Nemotron-Math-Proofs-v1](https://huggingface.co/datasets/nvidia/Nemotron-Math-Proofs-v1)，其回答由DeepSeek-V3.2-Speciale生成。 #### 科学领域我们从[Nemotron-Cascade-1-SFT](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-Stage-2)与[Nemotron-Science-v1](https://huggingface.co/datasets/nvidia/Nemotron-Science-v1)中获取科学类提示词，涵盖物理学、化学与生物学领域，对应的回答均由GPT-OSS-120B生成。 #### 通用对话通用对话样本来源于[Nemotron-Cascade-1-SFT](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-Stage-2)与[Nemotron-Instruction-Following-Chat-v1](https://huggingface.co/datasets/nvidia/Nemotron-Instruction-Following-Chat-v1)。 #### 指令遵循该类样本取自[Nemotron-Cascade-1-SFT](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-Stage-2)与[Nemotron-Instruction-Following-Chat-v1](https://huggingface.co/datasets/nvidia/Nemotron-Instruction-Following-Chat-v1)。 #### 安全领域该类样本来源于[Nemotron-SFT-Safety-v1](https://huggingface.co/datasets/nvidia/Nemotron-SFT-Safety-v1)。 #### 对话式AI智能体（AI Agent）对话式AI智能体的提示词来源于[Nemotron-Agentic-v1](https://huggingface.co/datasets/nvidia/Nemotron-Agentic-v1)与[Nemotron-RL-Agentic-Conversational-Tool-Use-Pivot-v1](https://huggingface.co/datasets/nvidia/Nemotron-RL-Agentic-Conversational-Tool-Use-Pivot-v1)，对应的回答由Qwen3-235B-A22B-Thinking-2507、Qwen3-32B、Qwen3-235B-A22B-Instruct-2507以及GPT-OSS-120B生成。 #### 软件工程智能体我们从[Nemotron-Cascade-1-SFT-SWE](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-1-SFT-SWE)中获取无智能体样本，涵盖缺陷代码定位、代码修复与测试用例生成任务。而带智能体的样本则取自[SWE-Gym](https://huggingface.co/datasets/SWE-Gym/SWE-Gym)、[SWE-rebench](https://huggingface.co/datasets/nebius/SWE-rebench)以及[R2E-Gym-Subset](https://huggingface.co/datasets/R2E-Gym/R2E-Gym-Subset)。 #### 终端智能体该类样本来源于[Nemotron-Terminal-Corpus](https://huggingface.co/datasets/nvidia/Nemotron-Terminal-Corpus)。 ## 训练流程我们将所有监督微调样本打包为最长256K 词元（Token）的序列，并采用单阶段流程训练模型。经实验验证，该监督微调模型在约1.5个训练轮次（epoch）后即可达到最优性能。 | 超参数 | 数值 | | :--- | :---: | | 全局批尺寸 | 64 | | 打包后序列长度 | 256K 词元（Token） | | 最大学习率 | 5e-5 | | 最小学习率 | 5e-6 | | 学习率预热步数 | 200 | | 学习率调度器 | 余弦退火（cosine） | | 最大训练步数 | 40,000 | | 优化器 | AdamW | | 优化器配置 | beta_1=0.9<br>beta_2=0.98 | | 权重衰减系数 | 0.1 | | 实际训练步数 | 33,000 | ## 数据统计 | 领域 | 样本数量 | | :--- | :---: | | 数学领域 | 5,226,364 | | 科学领域 | 2,717,163 | | 通用对话 | 13,972,873 | | 指令遵循 | 820,263 | | 安全领域 | 3,570 | | 对话式AI智能体 | 822,213 | | 软件工程智能体 | 439,610 | | 终端智能体 | 822,213 | ## 发布日期 2026年3月19日 ## 许可协议您对本数据集的使用需遵循[NVIDIA开放模型许可协议（NVIDIA Open Model License）](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/)。 ## 引用 @article{Nemotron_Cascade_2, title={Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation}, author={Yang, Zhuolin and Liu, Zihan and Chen, Yang and Dai, Wenliang and Wang, Boxin and Lin, Sheng-Chieh and Lee, Chankyu and Chen, Yangyi and Jiang, Dongfu and He, Jiafan and Pi, Renjie and Lam, Grace and Lee, Nayeon and Bukharin, Alexander and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei}, year={2026} }

提供机构：

maas

创建时间：

2026-03-20

5,000+

优质数据集

54 个

任务类型

进入经典数据集