tuandunghcmut/nemotron-agentic-v1

Name: tuandunghcmut/nemotron-agentic-v1
Creator: tuandunghcmut
Published: 2026-03-11 09:39:52
License: 暂无描述

Hugging Face2026-03-11 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/tuandunghcmut/nemotron-agentic-v1

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 language: - en tags: - tool-use - agentic - multi-turn - openai-format - reasoning-content task_categories: - text-generation configs: - config_name: default data_files: - split: interactive_agent path: data/interactive_agent-* - split: tool_calling path: data/tool_calling-* dataset_info: features: - name: messages dtype: string - name: tools_json dtype: string splits: - name: interactive_agent num_bytes: 539784936 num_examples: 19028 - name: tool_calling num_bytes: 5318087893 num_examples: 316094 download_size: 2278955765 dataset_size: 5857872829 --- # nemotron-agentic-v1 Cleaned version of [nvidia/Nemotron-Agentic-v1](https://huggingface.co/datasets/nvidia/Nemotron-Agentic-v1) converted to a uniform OpenAI-compatible tool-calling format. ## Source - **Original dataset:** `nvidia/Nemotron-Agentic-Tool-Use-v1` — multi-turn conversations where LLMs decompose goals, call tools, and reason over tool outputs. - **Splits:** `interactive_agent`, `tool_calling` ## Schema | Column | Type | Description | |--------|------|-------------| | `messages` | JSON string | List of message dicts | | `tools_json` | JSON string | List of tool definitions in OpenAI function-calling format | ### Message fields | Field | Present on roles | Notes | |-------|-----------------|-------| | `role` | all | `system` / `user` / `assistant` / `tool` | | `content` | all | `null` when assistant emits tool calls | | `reasoning_content` | `assistant` | Thinking; set when both content and tool_calls exist | | `tool_calls` | `assistant` | `arguments` is a JSON **string** | | `tool_call_id` | `tool` | Matches triggering tool call `id` | ## Processing Rules 1. **Arguments normalisation**: Raw `tool_calling` split stores `arguments` as a Python dict; converted to JSON string. 2. **Dict content normalisation**: Raw `tool_calling` split sometimes has tool-result `content` as a Python dict; serialised to JSON string. 3. **content / tool_calls mutual exclusivity**: assistant content moved to `reasoning_content` when `tool_calls` is non-empty. 4. **Schema workaround**: `interactive_agent` loaded via streaming; `tool_calling` loaded directly from JSONL (both avoid PyArrow mixed-type inference errors in the source). ## Statistics | Split | Rows | |-------|------| | `interactive_agent` | 19,028 | | `tool_calling` | 316,094 | ## Loading ```python from datasets import load_dataset import json ds = load_dataset("tuandunghcmut/nemotron-agentic-v1", split="interactive_agent") sample = ds[0] messages = json.loads(sample["messages"]) tools = json.loads(sample["tools_json"]) ``` ## Citation ```bibtex @misc{nemotron_agentic_v1, title = {Nemotron-Agentic-Tool-Use-v1}, author = {NVIDIA}, year = {2025}, url = {https://huggingface.co/datasets/nvidia/Nemotron-Agentic-v1} } ```

提供机构：

tuandunghcmut

5,000+

优质数据集

54 个

任务类型

进入经典数据集