AmanPriyanshu/tool-reasoning-sft-RESEARCH-explorations
收藏Hugging Face2026-04-02 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/AmanPriyanshu/tool-reasoning-sft-RESEARCH-explorations
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- text-generation
language:
- en
tags:
- reasoning
- tool-calling
- agentic
- multi-turn
- code-exploration
- multi-step-reasoning
license: apache-2.0
size_categories:
- 100K<n<1M
---
# Explorations Trajectories — Cleaned & Stripped
149,025 multi-turn code exploration agent trajectories converted into a strict reasoning + tool-call format with validated FSM transitions.
## Origin
Derived from [AmanPriyanshu/random-small-github-repositories](https://huggingface.co/datasets/AmanPriyanshu/random-small-github-repositories) and [AmanPriyanshu/random-python-github-repositories](https://huggingface.co/datasets/AmanPriyanshu/random-python-github-repositories).
Each trajectory is a search session where an agent navigates a GitHub repository using terminal commands to locate a target file. The agent reasons about project structure, runs `grep`/`ls`/`find`/`cat` commands, and submits ranked file recommendations. Trajectories were filtered to the goldilocks zone (7–15 turns with successful file discovery) and expanded across 3 rounds per seed.
## Format
Each row contains a structured multi-turn conversation with explicit reasoning traces and validated tool calls.
### Message Roles
| Role | Content |
|---|---|
| `system` | Tool-use protocol + JSON tool schemas + code exploration agent instructions |
| `user` | Target file request, or synthetic rejection asking agent to keep searching |
| `reasoning` | `<think>…</think>` — model's step-by-step reasoning |
| `tool_call` | `<tool_call>{"name": "...", "arguments": {...}}</tool_call>` — function invocation |
| `tool_output` | `<tool_response>…</tool_response>` — tool execution result |
| `answer` | `<answer>…</answer>` — ranked file recommendation |
### Trajectory Structure
```
system → user → reasoning → [tool_call → tool_output → reasoning →]* answer
```
For trajectories that needed bridging (14.2%), mid-trajectory file submissions are converted into multi-turn rejection loops:
```
... → reasoning → answer → user(rejection) → reasoning(retry) → tool_call → ... → answer
```
Trajectories range from 4 to 58 turns, with an average of ~32 messages per row.
## Schema
Single Parquet file with zstd compression.
| Column | Type | Description |
|---|---|---|
| `messages` | string | Converted conversation (JSON list of `{role, content}`) |
| `repo_id` | string | Anonymized repository identifier |
| `dataset` | string | Source split (`small_repos` or `py_repos`) |
| `alpha_hash` | string | Hash identifier for the repository |
| `seed_group_idx` | int64 | Seed group index for trajectory generation |
| `seed_file_options` | string | JSON list of candidate target files |
| `seed_file_selected` | string | The actual target file the agent must find |
| `naming_style` | string | How the target was described (`semantic` or `direct`) |
| `found_in_turns` | int64 | Number of turns the agent took to find the file |
| `trajectory_round` | int64 | Which of the 3 generation rounds (1, 2, or 3) |
| `needed_to_bridge` | bool | Whether synthetic rejection messages were inserted |
## Tools
2 tools available per trajectory:
| Tool | Description |
|---|---|
| `terminal` | Execute a terminal command on a Linux machine (with optional `max_chars` truncation) |
| `submit_recommended_files` | Submit a ranked list of file paths as the agent's recommendation |
## Conversion Details
- Source trajectories use `type`/`text`/`name`/`arguments`/`output` fields — conversion maps these to the canonical 6-role FSM format
- `reasoning` outputs → `<think>…</think>` messages; consecutive reasoning blocks merged
- `function_call` outputs (terminal) → `<tool_call>` + `<tool_response>` pairs
- `function_call` outputs (submit_recommended_files) → `<answer>` with ranked file list
- `message` outputs (text-based submissions) → `<answer>` with raw content
- Mid-trajectory submissions (agent submits files then keeps exploring) → `answer → user(synthetic rejection) → reasoning(retry)` bridging with 12-variation template pools
- Bridge reasoning and tail reasoning drawn from 12 domain-appropriate templates each
- 100% conversion rate on all 149,025 rows; zero FSM violations, zero content-tag failures
- Two validation layers: FSM transition check + content-tag non-empty regex check + tool_call JSON schema check
## Distribution
| Split | Rows | % |
|---|---|---|
| `small_repos` | 87,591 | 58.8% |
| `py_repos` | 61,434 | 41.2% |
| Naming Style | Rows | % |
|---|---|---|
| `semantic` | 95,652 | 64.2% |
| `direct` | 53,373 | 35.8% |
| Bridged | Rows | % |
|---|---|---|
| `needed_to_bridge=False` | 127,790 | 85.8% |
| `needed_to_bridge=True` | 21,235 | 14.2% |
## Usage
```py
import json, random, re
from datasets import load_dataset
VALID_NEXT = {
"system": {"user"}, "user": {"reasoning"},
"reasoning": {"tool_call", "answer"}, "tool_call": {"tool_output"},
"tool_output": {"reasoning"}, "answer": {"user"},
}
ds = load_dataset("AmanPriyanshu/tool-reasoning-sft-RESEARCH-explorations", split="train")
print(f"Loaded: {len(ds):,} rows\n")
idx = random.randint(0, len(ds) - 1)
row = ds[idx]
msgs = json.loads(row["messages"])
roles = [m["role"] for m in msgs]
tc = sum(1 for r in roles if r == "tool_call")
print(f"Row {idx} | repo={row['repo_id']} | target={row['seed_file_selected']}")
print(f" {len(msgs)} turns | {tc} tool_calls | bridged={row['needed_to_bridge']}")
print(f" Roles: {' -> '.join(roles[:15])}{'...' if len(roles)>15 else ''}\n")
# ── Validation 1: FSM transitions
bad = [(j, roles[j], roles[j+1]) for j in range(len(roles)-1)
if roles[j+1] not in VALID_NEXT.get(roles[j], set())]
if bad:
print(f"!! FSM VIOLATIONS: {len(bad)}")
for pos, a, b in bad[:5]:
print(f" [{pos}] {a} -> {b}")
else:
print("✓ FSM transitions: all valid")
# ── Validation 2: content tags
tag_ok = True
for i, t in enumerate(msgs):
r, c = t["role"], t["content"]
if r == "reasoning" and not re.search(r'<think>.+</think>', c, re.DOTALL):
tag_ok = False
elif r == "tool_call" and not re.search(r'<tool_call>.+</tool_call>', c, re.DOTALL):
tag_ok = False
elif r == "answer" and not re.search(r'<answer>.+</answer>', c, re.DOTALL):
tag_ok = False
elif r == "tool_output" and not re.search(r'<tool_response>.+</tool_response>', c, re.DOTALL):
tag_ok = False
print(f"{'✓' if tag_ok else '!!'} Content tags: {'all valid' if tag_ok else 'errors found'}")
# ── Print sample turns
print(f"\n{'='*70}")
for i, m in enumerate(msgs[:10]):
content = m["content"]
if m["role"] == "system":
content = content[:150] + "..."
elif len(content) > 200:
content = content[:200] + "..."
print(f"[{i}] {m['role']}:\n{content}\n")
if len(msgs) > 10:
print(f"... ({len(msgs) - 10} more turns)")
```
提供机构:
AmanPriyanshu



