AmanPriyanshu/tool-reasoning-sft-TOOLS-hermes-reasoning-tool-style-data-cleaned-rectified-115k

Name: AmanPriyanshu/tool-reasoning-sft-TOOLS-hermes-reasoning-tool-style-data-cleaned-rectified-115k
Creator: AmanPriyanshu
Published: 2026-03-10 19:27:19
License: 暂无描述

Hugging Face2026-03-10 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/AmanPriyanshu/tool-reasoning-sft-TOOLS-hermes-reasoning-tool-style-data-cleaned-rectified-115k

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - text-generation language: - en tags: - deep-research - reasoning - tool-calling - agentic - multi-hop - search size_categories: - 1K<n<10K --- --- ## Agentic Tool-Use SFT Mix 111,295 additional multi-turn agentic trajectories across four task families, following the same strict reasoning + tool-call FSM format. Combined with the original 3,827 deep-research trajectories, the dataset totals **115,122 samples**. ### Distribution | Category | Samples | Full | Compact | |---|---|---|---| | Deep Research (original) | 3,827 | 100% | — | | Multi-Turn Tool Orchestration | 45,776 | 54% | 46% | | Deep Research | 34,282 | 71% | 29% | | Codebase Retrieval | 17,473 | 69% | 31% | | Database Interaction | 13,764 | 69% | 31% | | **Total** | **115,122** | | | ### Schema Two columns: `messages` (JSON string — list of role/content dicts) and `source` (category label). ### Cleaning All trajectories validated against the strict FSM. Stray turns stripped, missing reasoning bridges inserted, consecutive reasoning merged. ~11k trajectories required at least one repair. ``` system → user → reasoning → tool_call → tool_output → reasoning → tool_call → ... → reasoning → answer ``` ## Validated Transitions ``` system → user user → reasoning reasoning → tool_call | answer tool_call → tool_output tool_output → reasoning answer → user (multi-turn only) ``` ## Usage ```py import json, random from huggingface_hub import hf_hub_download import pyarrow.parquet as pq REPO = "AmanPriyanshu/tool-reasoning-sft-hermes-reasoning-tool-style-data-cleaned-rectified-115k" FILES = ["compiled_data.parquet", "data.parquet"] for fname in FILES: print("=" * 70) print(f"Downloading {fname}...") local = hf_hub_download(REPO, fname, repo_type="dataset") t = pq.read_table(local) print(f"Rows: {t.num_rows:,} | Columns: {t.column_names}") idx = random.randint(0, t.num_rows - 1) row = {col: t.column(col)[idx].as_py() for col in t.column_names} msgs = json.loads(row["messages"]) meta = {k: v for k, v in row.items() if k != "messages"} print(f"\nRow {idx} | meta={meta} | {len(msgs)} turns") print(f"Roles: {' -> '.join(m['role'] for m in msgs[:20])}{'...' if len(msgs) > 20 else ''}\n") for m in msgs: content = m["content"] if m["role"] == "system": content = content[:200] + "..." elif len(content) > 300: content = content[:300] + "..." print(f"[{m['role']}]\n{content}\n") print() ``` ## License Apache-2.0

提供机构：

AmanPriyanshu

5,000+

优质数据集

54 个

任务类型

进入经典数据集