tuandunghcmut/nemotron-agentic-v1
收藏Hugging Face2026-03-11 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/tuandunghcmut/nemotron-agentic-v1
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
language:
- en
tags:
- tool-use
- agentic
- multi-turn
- openai-format
- reasoning-content
task_categories:
- text-generation
configs:
- config_name: default
data_files:
- split: interactive_agent
path: data/interactive_agent-*
- split: tool_calling
path: data/tool_calling-*
dataset_info:
features:
- name: messages
dtype: string
- name: tools_json
dtype: string
splits:
- name: interactive_agent
num_bytes: 539784936
num_examples: 19028
- name: tool_calling
num_bytes: 5318087893
num_examples: 316094
download_size: 2278955765
dataset_size: 5857872829
---
# nemotron-agentic-v1
Cleaned version of [nvidia/Nemotron-Agentic-v1](https://huggingface.co/datasets/nvidia/Nemotron-Agentic-v1) converted to a uniform OpenAI-compatible tool-calling format.
## Source
- **Original dataset:** `nvidia/Nemotron-Agentic-Tool-Use-v1` — multi-turn conversations where LLMs decompose goals, call tools, and reason over tool outputs.
- **Splits:** `interactive_agent`, `tool_calling`
## Schema
| Column | Type | Description |
|--------|------|-------------|
| `messages` | JSON string | List of message dicts |
| `tools_json` | JSON string | List of tool definitions in OpenAI function-calling format |
### Message fields
| Field | Present on roles | Notes |
|-------|-----------------|-------|
| `role` | all | `system` / `user` / `assistant` / `tool` |
| `content` | all | `null` when assistant emits tool calls |
| `reasoning_content` | `assistant` | Thinking; set when both content and tool_calls exist |
| `tool_calls` | `assistant` | `arguments` is a JSON **string** |
| `tool_call_id` | `tool` | Matches triggering tool call `id` |
## Processing Rules
1. **Arguments normalisation**: Raw `tool_calling` split stores `arguments` as a Python dict; converted to JSON string.
2. **Dict content normalisation**: Raw `tool_calling` split sometimes has tool-result `content` as a Python dict; serialised to JSON string.
3. **content / tool_calls mutual exclusivity**: assistant content moved to `reasoning_content` when `tool_calls` is non-empty.
4. **Schema workaround**: `interactive_agent` loaded via streaming; `tool_calling` loaded directly from JSONL (both avoid PyArrow mixed-type inference errors in the source).
## Statistics
| Split | Rows |
|-------|------|
| `interactive_agent` | 19,028 |
| `tool_calling` | 316,094 |
## Loading
```python
from datasets import load_dataset
import json
ds = load_dataset("tuandunghcmut/nemotron-agentic-v1", split="interactive_agent")
sample = ds[0]
messages = json.loads(sample["messages"])
tools = json.loads(sample["tools_json"])
```
## Citation
```bibtex
@misc{nemotron_agentic_v1,
title = {Nemotron-Agentic-Tool-Use-v1},
author = {NVIDIA},
year = {2025},
url = {https://huggingface.co/datasets/nvidia/Nemotron-Agentic-v1}
}
```
提供机构:
tuandunghcmut



