AmanPriyanshu/tool-reasoning-sft-TOOLS-toucan-1.5m-sft-tool-use-data-cleaned-rectified-333k

Name: AmanPriyanshu/tool-reasoning-sft-TOOLS-toucan-1.5m-sft-tool-use-data-cleaned-rectified-333k
Creator: AmanPriyanshu
Published: 2026-03-14 18:04:20
License: 暂无描述

Hugging Face2026-03-14 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/AmanPriyanshu/tool-reasoning-sft-TOOLS-toucan-1.5m-sft-tool-use-data-cleaned-rectified-333k

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - text-generation language: - en tags: - agent - tool-use - reasoning - multi-turn pretty_name: Toucan - OSS High Quality Hermes Reasoning Format size_categories: - 100K<n<1M --- # Toucan - OSS High Quality (Hermes Reasoning Format) Filtered and restructured subset of [Agent-Ark/Toucan-1.5M](https://huggingface.co/datasets/Agent-Ark/Toucan-1.5M). Format Inspiration: [SupritiVijay/dr-tulu-sft-deep-research-agent-data-cleaned-rectified](https://huggingface.co/datasets/SupritiVijay/dr-tulu-sft-deep-research-agent-data-cleaned-rectified) **Filters applied:** OSS split only · `overall_score > 3.0` · valid role transitions only **Size:** ~333K examples --- ## Format Each example is a multi-turn conversation with strict role transitions: ``` system → user → reasoning → tool_call → tool_output → reasoning → ... → answer ``` | Role | Content | |---|---| | `system` | Tool schemas + instructions | | `user` | Question | | `reasoning` | `<think>...</think>` | | `tool_call` | `<tool_call>{"name": ..., "arguments": {...}}</tool_call>` | | `tool_output` | `<tool_response>...</tool_response>` | | `answer` | `<answer>...</answer>` | Multi-turn conversations follow `answer → user` transitions. --- ## Changes from Original - Mapped `reasoning_content` → `<think>` blocks - Parsed tool call JSON; stripped `call_id` and null schema fields - Inserted synthetic `<think>` bridges where `tool_output → answer` transitions were missing reasoning - Wrapped final answers in `<answer>` tags - Dropped rows with invalid transitions (~1.2%) --- **Original dataset:** [Agent-Ark/Toucan-1.5M](https://huggingface.co/datasets/Agent-Ark/Toucan-1.5M) - Apache 2.0

提供机构：

AmanPriyanshu

5,000+

优质数据集

54 个

任务类型

进入经典数据集