five

tuandunghcmut/nemotron-agentic-v1

收藏
Hugging Face2026-03-11 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/tuandunghcmut/nemotron-agentic-v1
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 language: - en tags: - tool-use - agentic - multi-turn - openai-format - reasoning-content task_categories: - text-generation configs: - config_name: default data_files: - split: interactive_agent path: data/interactive_agent-* - split: tool_calling path: data/tool_calling-* dataset_info: features: - name: messages dtype: string - name: tools_json dtype: string splits: - name: interactive_agent num_bytes: 539784936 num_examples: 19028 - name: tool_calling num_bytes: 5318087893 num_examples: 316094 download_size: 2278955765 dataset_size: 5857872829 --- # nemotron-agentic-v1 Cleaned version of [nvidia/Nemotron-Agentic-v1](https://huggingface.co/datasets/nvidia/Nemotron-Agentic-v1) converted to a uniform OpenAI-compatible tool-calling format. ## Source - **Original dataset:** `nvidia/Nemotron-Agentic-Tool-Use-v1` — multi-turn conversations where LLMs decompose goals, call tools, and reason over tool outputs. - **Splits:** `interactive_agent`, `tool_calling` ## Schema | Column | Type | Description | |--------|------|-------------| | `messages` | JSON string | List of message dicts | | `tools_json` | JSON string | List of tool definitions in OpenAI function-calling format | ### Message fields | Field | Present on roles | Notes | |-------|-----------------|-------| | `role` | all | `system` / `user` / `assistant` / `tool` | | `content` | all | `null` when assistant emits tool calls | | `reasoning_content` | `assistant` | Thinking; set when both content and tool_calls exist | | `tool_calls` | `assistant` | `arguments` is a JSON **string** | | `tool_call_id` | `tool` | Matches triggering tool call `id` | ## Processing Rules 1. **Arguments normalisation**: Raw `tool_calling` split stores `arguments` as a Python dict; converted to JSON string. 2. **Dict content normalisation**: Raw `tool_calling` split sometimes has tool-result `content` as a Python dict; serialised to JSON string. 3. **content / tool_calls mutual exclusivity**: assistant content moved to `reasoning_content` when `tool_calls` is non-empty. 4. **Schema workaround**: `interactive_agent` loaded via streaming; `tool_calling` loaded directly from JSONL (both avoid PyArrow mixed-type inference errors in the source). ## Statistics | Split | Rows | |-------|------| | `interactive_agent` | 19,028 | | `tool_calling` | 316,094 | ## Loading ```python from datasets import load_dataset import json ds = load_dataset("tuandunghcmut/nemotron-agentic-v1", split="interactive_agent") sample = ds[0] messages = json.loads(sample["messages"]) tools = json.loads(sample["tools_json"]) ``` ## Citation ```bibtex @misc{nemotron_agentic_v1, title = {Nemotron-Agentic-Tool-Use-v1}, author = {NVIDIA}, year = {2025}, url = {https://huggingface.co/datasets/nvidia/Nemotron-Agentic-v1} } ```
提供机构:
tuandunghcmut
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作