Nemotron-Cascade-2-SFT-Data
收藏魔搭社区2026-05-22 更新2026-05-10 收录
下载链接:
https://modelscope.cn/datasets/nv-community/Nemotron-Cascade-2-SFT-Data
下载链接
链接失效反馈官方服务:
资源简介:
# Nemotron-Cascade-2-SFT-Data
We release the SFT data used for training [Nemotron-Cascade-2](https://huggingface.co/nvidia/Nemotron-Cascade-2-30B-A3B).
## Data sources
#### Math
Our non-proof math prompts are sourced from [Nemotron-Cascade-1-SFT](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-Stage-2) and [Nemotron-Math-v2](https://huggingface.co/datasets/nvidia/Nemotron-Math-v2), with responses generated by DeepSeek-V3.2, DeepSeek-V3.2-Speciale, and GPT-OSS-120B. For mathematical proofs, prompts are taken from [Nemotron-Math-Proofs-v1](https://huggingface.co/datasets/nvidia/Nemotron-Math-Proofs-v1) and generated using DeepSeek-V3.2-Speciale.
#### Science
We collect science prompts from [Nemotron-Cascade-1-SFT](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-Stage-2) and [Nemotron-Science-v1](https://huggingface.co/datasets/nvidia/Nemotron-Science-v1), coving physics, chemistry, and biology. Responses are generated by GPT-OSS-120B.
#### General Chat
We source general chat samples from [Nemotron-Cascade-1-SFT](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-Stage-2) and [Nemotron-Instruction-Following-Chat-v1](https://huggingface.co/datasets/nvidia/Nemotron-Instruction-Following-Chat-v1).
#### Instruction Following
The samples are sourced from [Nemotron-Cascade-1-SFT](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-Stage-2) and [Nemotron-Instruction-Following-Chat-v1](https://huggingface.co/datasets/nvidia/Nemotron-Instruction-Following-Chat-v1).
#### Safety
The samples are sourced from [Nemotron-SFT-Safety-v1](https://huggingface.co/datasets/nvidia/Nemotron-SFT-Safety-v1).
#### Conversational Agent
The prompts are sourced from [Nemotron-Agentic-v1](https://huggingface.co/datasets/nvidia/Nemotron-Agentic-v1) and [Nemotron-RL-Agentic-Conversational-Tool-Use-Pivot-v1](https://huggingface.co/datasets/nvidia/Nemotron-RL-Agentic-Conversational-Tool-Use-Pivot-v1), with responses generated by Qwen3-235B-A22B-Thinking-2507, Qwen3-32B, Qwen3-235B-A22B-Instruct-2507, and GPT-OSS-120B.
#### Software Engineering Agent
We collect agentless samples from [Nemotron-Cascade-1-SFT-SWE](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-1-SFT-SWE), covering buggy code localization, code repair, and test case generation. Agentic samples are drawn from [SWE-Gym](https://huggingface.co/datasets/SWE-Gym/SWE-Gym), [SWE-rebench](https://huggingface.co/datasets/nebius/SWE-rebench), and [R2E-Gym-Subset](https://huggingface.co/datasets/R2E-Gym/R2E-Gym-Subset).
#### Terminal Agent
The samples are sourced from [Nemotron-Terminal-Corpus](https://huggingface.co/datasets/nvidia/Nemotron-Terminal-Corpus).
## Training
We pack all SFT samples into sequences of up to 256K tokens and train the model in a single stage. Empirically, we find that the SFT model reaches optimal performance after approximately 1.5 epochs.
| Hyperparameters | |
| :--- | :---: |
| Global Batch Size | 64 |
| Packed Sequence Length | 256K |
| Max Learning Rate | 5e-5 |
| Min Learning Rate | 5e-6 |
| Learning Rate Warmup Steps | 200 |
| Scheduler | cosine |
| Max Steps | 40,000 |
| Optimizer | AdamW |
| Optimizer Config | beta_1=0.9<br>beta_2=0.98 |
| Weight Decay | 0.1 |
| # of training steps | 33,000 |
## Statistics
| Domain | # Samples |
| :--- | :---: |
| Math | 5,226,364 |
| Science | 2,717,163 |
| General Chat | 13,972,873 |
| Instruction Following | 820,263 |
| Safety | 3,570 |
| Conversational Agent | 822,213 |
| Software Engineering Agent | 439,610 |
| Terminal Agent | 822,213 |
## Release Date
Mar 19, 2026
## License
Your use of this model is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
## Citation
```
@article{Nemotron_Cascade_2,
title={Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation},
author={Yang, Zhuolin and Liu, Zihan and Chen, Yang and Dai, Wenliang and Wang, Boxin and Lin, Sheng-Chieh and Lee, Chankyu and Chen, Yangyi and Jiang, Dongfu and He, Jiafan and Pi, Renjie and Lam, Grace and Lee, Nayeon and Bukharin, Alexander and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
year={2026}
}
```
# Nemotron-Cascade-2-SFT数据集
本数据集发布了用于训练[Nemotron-Cascade-2](https://huggingface.co/nvidia/Nemotron-Cascade-2-30B-A3B)的监督微调(Supervised Fine-Tuning,SFT)数据。
## 数据来源
#### 数学领域
本数据集的非证明类数学提示词来源于[Nemotron-Cascade-1-SFT](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-Stage-2)与[Nemotron-Math-v2](https://huggingface.co/datasets/nvidia/Nemotron-Math-v2),对应的回答由DeepSeek-V3.2、DeepSeek-V3.2-Speciale以及GPT-OSS-120B生成。而数学证明类提示词则取自[Nemotron-Math-Proofs-v1](https://huggingface.co/datasets/nvidia/Nemotron-Math-Proofs-v1),其回答由DeepSeek-V3.2-Speciale生成。
#### 科学领域
我们从[Nemotron-Cascade-1-SFT](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-Stage-2)与[Nemotron-Science-v1](https://huggingface.co/datasets/nvidia/Nemotron-Science-v1)中获取科学类提示词,涵盖物理学、化学与生物学领域,对应的回答均由GPT-OSS-120B生成。
#### 通用对话
通用对话样本来源于[Nemotron-Cascade-1-SFT](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-Stage-2)与[Nemotron-Instruction-Following-Chat-v1](https://huggingface.co/datasets/nvidia/Nemotron-Instruction-Following-Chat-v1)。
#### 指令遵循
该类样本取自[Nemotron-Cascade-1-SFT](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-SFT-Stage-2)与[Nemotron-Instruction-Following-Chat-v1](https://huggingface.co/datasets/nvidia/Nemotron-Instruction-Following-Chat-v1)。
#### 安全领域
该类样本来源于[Nemotron-SFT-Safety-v1](https://huggingface.co/datasets/nvidia/Nemotron-SFT-Safety-v1)。
#### 对话式AI智能体(AI Agent)
对话式AI智能体的提示词来源于[Nemotron-Agentic-v1](https://huggingface.co/datasets/nvidia/Nemotron-Agentic-v1)与[Nemotron-RL-Agentic-Conversational-Tool-Use-Pivot-v1](https://huggingface.co/datasets/nvidia/Nemotron-RL-Agentic-Conversational-Tool-Use-Pivot-v1),对应的回答由Qwen3-235B-A22B-Thinking-2507、Qwen3-32B、Qwen3-235B-A22B-Instruct-2507以及GPT-OSS-120B生成。
#### 软件工程智能体
我们从[Nemotron-Cascade-1-SFT-SWE](https://huggingface.co/datasets/nvidia/Nemotron-Cascade-1-SFT-SWE)中获取无智能体样本,涵盖缺陷代码定位、代码修复与测试用例生成任务。而带智能体的样本则取自[SWE-Gym](https://huggingface.co/datasets/SWE-Gym/SWE-Gym)、[SWE-rebench](https://huggingface.co/datasets/nebius/SWE-rebench)以及[R2E-Gym-Subset](https://huggingface.co/datasets/R2E-Gym/R2E-Gym-Subset)。
#### 终端智能体
该类样本来源于[Nemotron-Terminal-Corpus](https://huggingface.co/datasets/nvidia/Nemotron-Terminal-Corpus)。
## 训练流程
我们将所有监督微调样本打包为最长256K 词元(Token)的序列,并采用单阶段流程训练模型。经实验验证,该监督微调模型在约1.5个训练轮次(epoch)后即可达到最优性能。
| 超参数 | 数值 |
| :--- | :---: |
| 全局批尺寸 | 64 |
| 打包后序列长度 | 256K 词元(Token) |
| 最大学习率 | 5e-5 |
| 最小学习率 | 5e-6 |
| 学习率预热步数 | 200 |
| 学习率调度器 | 余弦退火(cosine) |
| 最大训练步数 | 40,000 |
| 优化器 | AdamW |
| 优化器配置 | beta_1=0.9<br>beta_2=0.98 |
| 权重衰减系数 | 0.1 |
| 实际训练步数 | 33,000 |
## 数据统计
| 领域 | 样本数量 |
| :--- | :---: |
| 数学领域 | 5,226,364 |
| 科学领域 | 2,717,163 |
| 通用对话 | 13,972,873 |
| 指令遵循 | 820,263 |
| 安全领域 | 3,570 |
| 对话式AI智能体 | 822,213 |
| 软件工程智能体 | 439,610 |
| 终端智能体 | 822,213 |
## 发布日期
2026年3月19日
## 许可协议
您对本数据集的使用需遵循[NVIDIA开放模型许可协议(NVIDIA Open Model License)](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/)。
## 引用
@article{Nemotron_Cascade_2,
title={Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation},
author={Yang, Zhuolin and Liu, Zihan and Chen, Yang and Dai, Wenliang and Wang, Boxin and Lin, Sheng-Chieh and Lee, Chankyu and Chen, Yangyi and Jiang, Dongfu and He, Jiafan and Pi, Renjie and Lam, Grace and Lee, Nayeon and Bukharin, Alexander and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
year={2026}
}
提供机构:
maas
创建时间:
2026-03-20



