nvidia/Nemotron-Instruction-Following-Chat-v1
收藏Hugging Face2025-12-15 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/nvidia/Nemotron-Instruction-Following-Chat-v1
下载链接
链接失效反馈官方服务:
资源简介:
Nemotron-Instruction-Following-Chat-v1数据集旨在广泛增强模型的交互能力,涵盖开放式聊天、精确指令遵循和可靠的结构化输出生成。它结合了来自[Nemotron-Post-Training-Dataset-v2](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v2)的更新聊天数据(扩展到多轮对话)以及由GPT-OSS-120B和Qwen3-235B等强大前沿模型生成的合成对话。该数据集包含两个子集:聊天与指令遵循子集和结构化输出子集。前者旨在提高模型在单轮和多轮对话中的参与能力,后者旨在提高模型在JSON格式下遵循输出格式指令的能力。数据集采用混合方法(人工、合成、自动化)进行数据收集和标注,格式为JSONL,总样本数为430,978,磁盘大小约为6.6 GB。数据集可自由用于训练和评估,适用于商业用途。
The Nemotron-Instruction-Following-Chat-v1 dataset is designed to broadly strengthen the model’s interactive capabilities, spanning open-ended chat, precise instruction following, and reliable structured output generation. It combines refreshed chat data from [Nemotron-Post-Training-Dataset-v2](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v2) (extended to multi-turn) with synthetic dialogues produced by strong frontier models such as GPT-OSS-120B and Qwen3-235B variants. The dataset contains two subsets: Chat and Instruction Following, and Structured Outputs. The former aims to improve the models capabilities in engaging with users in single and multi-turn setups, while the latter focuses on improving the models ability to follow output formatting instructions under JSON schema constraints. The dataset uses a hybrid method (human, synthetic, automated) for data collection and labeling, is formatted in JSONL, and contains a total of 430,978 samples with a disk size of ~6.6 GB. It is freely available for training and evaluation and is ready for commercial use.
提供机构:
nvidia



