AmanPriyanshu/tool-reasoning-sft-CODING-Nemotron-Terminal-Corpus-data-cleaned-rectified

Name: AmanPriyanshu/tool-reasoning-sft-CODING-Nemotron-Terminal-Corpus-data-cleaned-rectified
Creator: AmanPriyanshu
Published: 2026-03-03 11:19:55
License: 暂无描述

Hugging Face2026-03-03 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/AmanPriyanshu/tool-reasoning-sft-CODING-Nemotron-Terminal-Corpus-data-cleaned-rectified

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 task_categories: - text-generation language: - en tags: - terminal - agent - tool-use - reasoning - sft - multi-turn - code - math - software-engineering size_categories: - 100K<n<1M --- # Nemotron-Terminal-Corpus — Cleaned & Rectified Cleaned and restructured version of [nvidia/Nemotron-Terminal-Corpus](https://huggingface.co/datasets/nvidia/Nemotron-Terminal-Corpus). The original dataset contains ~366K terminal agent trajectories built by NVIDIA using the Terminal-Task-Gen pipeline across math, code, SWE, and synthetic skill-based domains. This version converts the JSON-action format into a strict multi-turn conversation structure with explicit reasoning traces, validated JSON tool calls, and proper role transitions. Original Dataset: [nvidia/Nemotron-Terminal-Corpus](https://huggingface.co/datasets/nvidia/Nemotron-Terminal-Corpus) ## What Changed ### Original Format (JSON Actions) ``` - user: [system prompt + task description + terminal state] - assistant: <think>...</think> {"analysis": "...", "plan": "...", "commands": [...], "task_complete": false} - user: [terminal output] - assistant: <think>...</think> {"analysis": "...", "plan": "...", "commands": [...], "task_complete": true} ``` ### New Format (Multi-Turn with Reasoning) ``` - system: System prompt with tool-use protocol + execute_commands schema - user: Task description + terminal state - reasoning: <think>analysis + plan + thinking</think> - tool_call: <tool_call>{"name": "execute_commands", "arguments": {"commands": [...]}}</tool_call> - tool_output: <tool_response>terminal output</tool_response> - reasoning: <think>...</think> - ... - answer: <answer>final summary</answer> ``` ## Files | File | Contents | Split Values | |---|---|---| | `dataset_adapters.parquet` | Math, Code, SWE adapter trajectories | `dataset_adapters` | | `skill.parquet` | Synthetic skill-based tasks | `easy`, `medium`, `mixed` | ## Message Roles | Role | Content | |---|---| | `system` | Terminal agent instructions + tool-use protocol + execute_commands schema | | `user` | Task description + initial terminal state | | `reasoning` | `<think>…</think>` — analysis, plan, and chain-of-thought | | `tool_call` | `<tool_call>{"name": "execute_commands", "arguments": {"commands": [...]}}</tool_call>` | | `tool_output` | `<tool_response>…</tool_response>` — terminal output | | `answer` | `<answer>…</answer>` — final task summary | ## License CC-BY-4.0 (same as original dataset). ## Citation ```bibtex @misc{pi2026dataengineeringscalingllm, title={On Data Engineering for Scaling LLM Terminal Capabilities}, author={Renjie Pi and Grace Lam and Mohammad Shoeybi and Pooya Jannaty and Bryan Catanzaro and Wei Ping}, year={2026}, eprint={2602.21193}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2602.21193}, } ```

提供机构：

AmanPriyanshu

5,000+

优质数据集

54 个

任务类型

进入经典数据集