nvidia/Nemotron-Instruction-Following-Chat-v1

Name: nvidia/Nemotron-Instruction-Following-Chat-v1
Creator: nvidia
Published: 2025-12-15 05:27:15
License: 暂无描述

Hugging Face2025-12-15 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/nvidia/Nemotron-Instruction-Following-Chat-v1

下载链接

链接失效反馈

官方服务：

资源简介：

Nemotron-Instruction-Following-Chat-v1数据集旨在广泛增强模型的交互能力，涵盖开放式聊天、精确指令遵循和可靠的结构化输出生成。它结合了来自[Nemotron-Post-Training-Dataset-v2](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v2)的更新聊天数据（扩展到多轮对话）以及由GPT-OSS-120B和Qwen3-235B等强大前沿模型生成的合成对话。该数据集包含两个子集：聊天与指令遵循子集和结构化输出子集。前者旨在提高模型在单轮和多轮对话中的参与能力，后者旨在提高模型在JSON格式下遵循输出格式指令的能力。数据集采用混合方法（人工、合成、自动化）进行数据收集和标注，格式为JSONL，总样本数为430,978，磁盘大小约为6.6 GB。数据集可自由用于训练和评估，适用于商业用途。

The Nemotron-Instruction-Following-Chat-v1 dataset is designed to broadly strengthen the model’s interactive capabilities, spanning open-ended chat, precise instruction following, and reliable structured output generation. It combines refreshed chat data from [Nemotron-Post-Training-Dataset-v2](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v2) (extended to multi-turn) with synthetic dialogues produced by strong frontier models such as GPT-OSS-120B and Qwen3-235B variants. The dataset contains two subsets: Chat and Instruction Following, and Structured Outputs. The former aims to improve the models capabilities in engaging with users in single and multi-turn setups, while the latter focuses on improving the models ability to follow output formatting instructions under JSON schema constraints. The dataset uses a hybrid method (human, synthetic, automated) for data collection and labeling, is formatted in JSONL, and contains a total of 430,978 samples with a disk size of ~6.6 GB. It is freely available for training and evaluation and is ready for commercial use.

提供机构：

nvidia

5,000+

优质数据集

54 个

任务类型

进入经典数据集