AmanPriyanshu/tool-reasoning-sft-TOOLS-toucan-1.5m-sft-tool-use-data-cleaned-rectified-333k
收藏Hugging Face2026-03-14 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/AmanPriyanshu/tool-reasoning-sft-TOOLS-toucan-1.5m-sft-tool-use-data-cleaned-rectified-333k
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- text-generation
language:
- en
tags:
- agent
- tool-use
- reasoning
- multi-turn
pretty_name: Toucan - OSS High Quality Hermes Reasoning Format
size_categories:
- 100K<n<1M
---
# Toucan - OSS High Quality (Hermes Reasoning Format)
Filtered and restructured subset of [Agent-Ark/Toucan-1.5M](https://huggingface.co/datasets/Agent-Ark/Toucan-1.5M).
Format Inspiration: [SupritiVijay/dr-tulu-sft-deep-research-agent-data-cleaned-rectified](https://huggingface.co/datasets/SupritiVijay/dr-tulu-sft-deep-research-agent-data-cleaned-rectified)
**Filters applied:** OSS split only · `overall_score > 3.0` · valid role transitions only
**Size:** ~333K examples
---
## Format
Each example is a multi-turn conversation with strict role transitions:
```
system → user → reasoning → tool_call → tool_output → reasoning → ... → answer
```
| Role | Content |
|---|---|
| `system` | Tool schemas + instructions |
| `user` | Question |
| `reasoning` | `<think>...</think>` |
| `tool_call` | `<tool_call>{"name": ..., "arguments": {...}}</tool_call>` |
| `tool_output` | `<tool_response>...</tool_response>` |
| `answer` | `<answer>...</answer>` |
Multi-turn conversations follow `answer → user` transitions.
---
## Changes from Original
- Mapped `reasoning_content` → `<think>` blocks
- Parsed tool call JSON; stripped `call_id` and null schema fields
- Inserted synthetic `<think>` bridges where `tool_output → answer` transitions were missing reasoning
- Wrapped final answers in `<answer>` tags
- Dropped rows with invalid transitions (~1.2%)
---
**Original dataset:** [Agent-Ark/Toucan-1.5M](https://huggingface.co/datasets/Agent-Ark/Toucan-1.5M) - Apache 2.0
提供机构:
AmanPriyanshu



