nvidia/Nemotron-Post-Training-Dataset-v1
收藏Hugging Face2025-08-25 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/nvidia/Nemotron-Post-Training-Dataset-v1
下载链接
链接失效反馈官方服务:
资源简介:
Nemotron-Post-Training-Dataset-v1是一个支持改进Llama instruct模型数学、代码、STEM、一般推理和工具调用能力的SFT数据集。数据集包括uuid、license、generator、version、category、reasoning、messages和metadata等特征。数据分为聊天、代码、数学、STEM和工具调用等类别,每个类别都有指定的示例数和字节大小。数据集遵循CC BY 4.0许可,可用于商业和非商业用途。数据集描述为合成,数据收集和标注方法也是合成。数据集格式为文本数据,适合训练AI代理系统、聊天机器人、RAG系统和其他AI应用程序。
The Nemotron-Post-Training-Dataset-v1 is a compilation of SFT data that supports improvements of math, code, stem, general reasoning, and tool calling capabilities of the original Llama instruct model [Llama-3.3-Nemotron-Super-49B-v1.5](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-v1_5). This dataset is ready for commercial/non-commercial use and is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). The data is split into different categories including chat, code, math, stem, and tool_calling, each with a specified number of examples and byte size. The dataset is described as synthetic, with data collection and labeling methods also being synthetic. The dataset is formatted as text data and is suitable for training AI agent systems, chatbots, RAG systems, and other AI-powered applications.
提供机构:
nvidia



