AmanPriyanshu/tool-reasoning-sft-RESEARCH-REDSearcher_SFT_10K

Name: AmanPriyanshu/tool-reasoning-sft-RESEARCH-REDSearcher_SFT_10K
Creator: AmanPriyanshu
Published: 2026-03-24 19:10:57
License: 暂无描述

Hugging Face2026-03-24 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/AmanPriyanshu/tool-reasoning-sft-RESEARCH-REDSearcher_SFT_10K

下载链接

链接失效反馈

官方服务：

资源简介：

--- task_categories: - text-generation language: - en - zh tags: - reasoning - tool-calling - agentic - multi-turn - deep-search - multi-step-reasoning size_categories: - 1K<n<10K --- # REDSearcher SFT 10K — Cleaned & Rectified ~8,850 multi-turn deep-search agent trajectories converted into a strict reasoning + tool-call format with validated FSM transitions. ## Origin Derived from [Zchu/REDSearcher_SFT_10K](https://huggingface.co/datasets/Zchu/REDSearcher_SFT_10K). REDSearcher is a deep search assistant dataset featuring rigorous, multi-step, multi-source investigations. Each trajectory contains a complex research question answered through iterative search → visit → reason → answer cycles, with extensive chain-of-thought reasoning already present in the source data. 📄 **Paper:** [REDSearcher: A Novel Framework for Real-time Exploration and Discovery](https://huggingface.co/datasets/Zchu/REDSearcher_SFT_10K) *(see dataset card for citation)* ## Format Each row contains a structured multi-turn conversation with explicit reasoning traces and validated tool calls. ### Message Roles | Role | Content | |---|---| | `system` | Tool-use protocol + JSON tool schemas + deep-search instructions | | `user` | Research question or follow-up | | `reasoning` | `<think>…</think>` — model's step-by-step reasoning | | `tool_call` | `<tool_call>{"name": "...", "arguments": {...}}</tool_call>` — function invocation | | `tool_output` | `<tool_response>…</tool_response>` — tool execution result | | `answer` | `<answer>…</answer>` — final synthesized response | ### Trajectory Structure ``` system → user → reasoning → [tool_call → tool_output → reasoning →]* answer ``` Trajectories range from ~30 to ~360 turns, with 14–152 tool calls per row (avg ~64). ## Schema Single Parquet file with zstd compression. | Column | Type | Description | |---|---|---| | `messages` | string | Converted conversation (JSON list of `{role, content}`) | | `language` | string | Language of the query (`en` or `zh`) | ## Tools 5 tools available per trajectory: | Tool | Description | |---|---| | `search` | Google web search with multiple queries | | `visit` | Visit webpage(s) and extract content | | `google_scholar` | Academic publication search | | `google_maps` | Google Maps place search | | `PythonInterpreter` | Sandboxed Python code execution | ## Conversion Details - Source data already uses `<think>`, `<tool_call>`, `<tool_response>`, and `<answer>` XML tags — conversion is **decomposition** of compound assistant messages into separate FSM turns, not synthesis - Assistant messages with `<think>` + `<tool_call>` split into separate `reasoning` + `tool_call` turns - Assistant messages with `<think>` + `<answer>` split into separate `reasoning` + `answer` turns - User messages containing `<tool_response>` mapped to `tool_output` turns - Tool schemas extracted from `<tools>` XML block in system prompt, converted to clean JSON - Bridge reasoning synthesized only when FSM requires it (rare — source already has `<think>` blocks) - ~88.5% conversion rate; failures are malformed source rows with `reasoning→tool_output` transition violations (assistant has `<think>` but no `<tool_call>`/`<answer>`, followed immediately by `<tool_response>`) - Two validation layers: FSM transition check + content-tag non-empty check ## Usage ```py import json, random, re from datasets import load_dataset VALID_NEXT = { "system": {"user"}, "user": {"reasoning"}, "reasoning": {"tool_call", "answer"}, "tool_call": {"tool_output"}, "tool_output": {"reasoning"}, "answer": {"user"}, } ds = load_dataset("AmanPriyanshu/tool-reasoning-sft-REDSearcher_SFT_10K", split="train") print(f"Loaded: {len(ds):,} rows\n") idx = random.randint(0, len(ds) - 1) row = ds[idx] msgs = json.loads(row["messages"]) lang = row["language"] roles = [m["role"] for m in msgs] tc = sum(1 for r in roles if r == "tool_call") print(f"Row {idx} | language={lang} | {len(msgs)} turns | {tc} tool_calls") print(f"Roles: {' -> '.join(roles[:20])}{'...' if len(roles)>20 else ''}\n") # ── Validation 1: FSM transitions ──────────────────────────────────────────── bad = [(j, roles[j], roles[j+1]) for j in range(len(roles)-1) if roles[j+1] not in VALID_NEXT.get(roles[j], set())] if bad: print(f"!! FSM VIOLATIONS: {len(bad)}") for pos, a, b in bad[:5]: print(f" [{pos}] {a} -> {b}") else: print("✓ FSM transitions: all valid") # ── Validation 2: content tags ─────────────────────────────────────────────── tag_errors = [] for i, t in enumerate(msgs): r, c = t["role"], t["content"] if r == "reasoning": if not re.search(r'<think>.+</think>', c, re.DOTALL): tag_errors.append((i, r, "empty <think>")) elif r == "tool_call": if not re.search(r'<tool_call>.+</tool_call>', c, re.DOTALL): tag_errors.append((i, r, "empty <tool_call>")) else: blob = c[c.find("{"):c.rfind("}") + 1] try: obj = json.loads(blob) if "name" not in obj or "arguments" not in obj: tag_errors.append((i, r, "missing name/arguments")) except json.JSONDecodeError as e: tag_errors.append((i, r, f"invalid JSON: {e}")) elif r == "answer": if not re.search(r'<answer>.+</answer>', c, re.DOTALL): tag_errors.append((i, r, "empty <answer>")) elif r == "tool_output": if not re.search(r'<tool_response>.+</tool_response>', c, re.DOTALL): tag_errors.append((i, r, "empty <tool_response>")) if tag_errors: print(f"!! TAG ERRORS: {len(tag_errors)}") for pos, role, err in tag_errors[:5]: print(f" [{pos}] {role}: {err}") else: print("✓ Content tags: all valid") # ── Validation 3: structure checks ─────────────────────────────────────────── checks = [] if roles[0] != "system": checks.append("first role is not system") if roles[1] != "user": checks.append("second role is not user") if roles[-1] != "answer": checks.append(f"last role is {roles[-1]}, expected answer") if any(roles[i] == roles[i+1] for i in range(len(roles)-1)): dupes = [(i, roles[i]) for i in range(len(roles)-1) if roles[i] == roles[i+1]] checks.append(f"consecutive same-role at {dupes[0]}") if checks: print(f"!! STRUCTURE ISSUES: {len(checks)}") for c in checks: print(f" {c}") else: print("✓ Structure: system→user→...→answer, no consecutive duplicates") # ── Print turns ────────────────────────────────────────────────────────────── print(f"\n{'='*70}") print(f"FULL CONVERSATION ({len(msgs)} turns)") print(f"{'='*70}\n") for i, m in enumerate(msgs): content = m["content"] if m["role"] == "system": content = content[:200] + "..." elif len(content) > 300: content = content[:300] + "..." print(f"[{i}] {m['role']}:\n{content}\n") ``` ## Language Distribution | Language | Rows | |---|---| | `en` | ~95% | | `zh` | ~5% |

提供机构：

AmanPriyanshu

5,000+

优质数据集

54 个

任务类型

进入经典数据集