five

AmanPriyanshu/tool-reasoning-sft-TOOLS-toucan-1.5m-sft-tool-use-data-cleaned-rectified-333k

收藏
Hugging Face2026-03-14 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/AmanPriyanshu/tool-reasoning-sft-TOOLS-toucan-1.5m-sft-tool-use-data-cleaned-rectified-333k
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-generation language: - en tags: - agent - tool-use - reasoning - multi-turn pretty_name: Toucan - OSS High Quality Hermes Reasoning Format size_categories: - 100K<n<1M --- # Toucan - OSS High Quality (Hermes Reasoning Format) Filtered and restructured subset of [Agent-Ark/Toucan-1.5M](https://huggingface.co/datasets/Agent-Ark/Toucan-1.5M). Format Inspiration: [SupritiVijay/dr-tulu-sft-deep-research-agent-data-cleaned-rectified](https://huggingface.co/datasets/SupritiVijay/dr-tulu-sft-deep-research-agent-data-cleaned-rectified) **Filters applied:** OSS split only · `overall_score > 3.0` · valid role transitions only **Size:** ~333K examples --- ## Format Each example is a multi-turn conversation with strict role transitions: ``` system → user → reasoning → tool_call → tool_output → reasoning → ... → answer ``` | Role | Content | |---|---| | `system` | Tool schemas + instructions | | `user` | Question | | `reasoning` | `<think>...</think>` | | `tool_call` | `<tool_call>{"name": ..., "arguments": {...}}</tool_call>` | | `tool_output` | `<tool_response>...</tool_response>` | | `answer` | `<answer>...</answer>` | Multi-turn conversations follow `answer → user` transitions. --- ## Changes from Original - Mapped `reasoning_content` → `<think>` blocks - Parsed tool call JSON; stripped `call_id` and null schema fields - Inserted synthetic `<think>` bridges where `tool_output → answer` transitions were missing reasoning - Wrapped final answers in `<answer>` tags - Dropped rows with invalid transitions (~1.2%) --- **Original dataset:** [Agent-Ark/Toucan-1.5M](https://huggingface.co/datasets/Agent-Ark/Toucan-1.5M) - Apache 2.0
提供机构:
AmanPriyanshu
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作