AmanPriyanshu/tool-reasoning-sft-RESEARCH-rlvr-env-retrieval-source
收藏Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/AmanPriyanshu/tool-reasoning-sft-RESEARCH-rlvr-env-retrieval-source
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- text-generation
language:
- en
tags:
- reasoning
- tool-calling
- agentic
- multi-turn
- retrieval
- RAG
- search
- rlvr
size_categories:
- 100K<n<1M
---
# Tool-Reasoning SFT — RLVR Retrieval Source Trajectories
156,381 multi-turn agentic retrieval trajectories across three document corpora, in a strict reasoning + tool-call format with validated FSM transitions. Each trajectory records a model searching a corpus, opening documents, and citing relevant passages to answer a question.
**Author:** [Aman Priyanshu](https://huggingface.co/AmanPriyanshu)
## Source Environments
Trajectories were collected against three RLVR retrieval environments from the [FORMAT: Search - Retrieve RLVR](https://huggingface.co/collections/AmanPriyanshu/format-search-retrieve-rlvr) collection:
| File | Source Environment | Domain | Rows |
|---|---|---|---|
| `nvdocs.parquet` | [RLVR-Env-Retrieval-Source-Retrieval-Synthetic-NVDocs-v1](https://huggingface.co/datasets/AmanPriyanshu/RLVR-Env-Retrieval-Source-Retrieval-Synthetic-NVDocs-v1) | NVIDIA technical documentation | 49,696 |
| `search-py.parquet` | [RLVR-Env-Retrieval-Source-code-search-net-python](https://huggingface.co/datasets/AmanPriyanshu/RLVR-Env-Retrieval-Source-code-search-net-python) | Python functions (CodeSearchNet) | 54,033 |
| `search-js.parquet` | [RLVR-Env-Retrieval-Source-code-search-net-javascript](https://huggingface.co/datasets/AmanPriyanshu/RLVR-Env-Retrieval-Source-code-search-net-javascript) | JavaScript functions (CodeSearchNet) | 52,652 |
Each source environment provides 100k QA pairs with ground-truth chunks, random distractors, and hard-negative distractors. Roughly half were successfully converted into trajectories.
## Tools
Three retrieval tools available in every trajectory:
| Tool | Description |
|---|---|
| `semantic_search` | Searches documents by semantic similarity, returns top-5 snippets with doc_ids and scores |
| `regex_search` | Searches documents using a regex pattern, returns top-5 matches with doc_ids and context |
| `open_document` | Opens and reads the full text of a specific document by its doc_id |
## Format
### Message Roles
| Role | Content |
|---|---|
| `system` | Tool-use protocol + tool schemas + retrieval instructions |
| `user` | Question to answer using the document corpus |
| `reasoning` | `<think>…</think>` — model's step-by-step reasoning |
| `tool_call` | `<tool_call>{"name": "...", "arguments": {...}}</tool_call>` — function invocation |
| `tool_output` | `<tool_response>…</tool_response>` — tool execution result |
| `answer` | `<answer>…</answer>` — final response citing retrieved documents |
### Trajectory Structure
```
system → user → reasoning → [tool_call → tool_output → reasoning →]* answer
```
## Schema
| Column | Type | Description |
|---|---|---|
| `messages` | string | Converted conversation (JSON list of `{role, content}`) |
| `qa_id` | string | Unique question ID (matches source environment) |
| `status` | string | `completed` or `completed_forced` |
| `gt_seen_in_search` | bool | Ground-truth document appeared in search results |
| `gt_opened` | bool | Ground-truth document was opened |
| `gt_cited` | bool | Ground-truth document was cited in the answer |
| `cited_doc_ids` | string | JSON list of all cited document IDs |
| `n_searches` | int | Number of `semantic_search` + `regex_search` calls |
| `n_opens` | int | Number of `open_document` calls |
| `n_tool_actions` | int | Total tool calls in the trajectory |
## Statistics
| Metric | nvdocs | search-py | search-js |
|---|---|---|---|
| Rows | 49,696 | 54,033 | 52,652 |
| Completed | 48,175 (96.9%) | 50,850 (94.1%) | 46,772 (88.8%) |
| Completed (forced) | 1,521 (3.1%) | 3,183 (5.9%) | 5,880 (11.2%) |
| GT seen in search | 47,236 (95.0%) | 49,111 (90.9%) | 38,242 (72.6%) |
| GT opened | 43,696 (87.9%) | 47,183 (87.3%) | 32,138 (61.0%) |
| GT cited | 38,983 (78.4%) | 42,868 (79.3%) | 27,028 (51.3%) |
| Avg searches/row | 16.2 | 19.3 | 21.3 |
| Avg opens/row | 7.5 | 4.7 | 4.5 |
| Avg tool actions/row | 23.9 | 24.2 | 25.8 |
## Generation
Trajectories were collected using `gpt-oss-120b` served via vLLM on 8×H100 GPUs. The model was given access to three retrieval tools and asked to find and cite the relevant document(s) to answer each question. Trajectories were then corrected via replay to fix tool output mismatches, and converted into the canonical 6-role FSM format with validated transitions.
## Usage
```py
import json, random
from huggingface_hub import hf_hub_download
REPO = "AmanPriyanshu/tool-reasoning-sft-RESEARCH-rlvr-env-retrieval-source"
for fname in ["nvdocs.parquet", "search-py.parquet", "search-js.parquet"]:
print(f"\n{'='*60}")
print(f"Downloading {fname}...")
local = hf_hub_download(REPO, fname, repo_type="dataset")
import pyarrow.parquet as pq
t = pq.read_table(local)
idx = random.randint(0, t.num_rows - 1)
row = {col: t.column(col)[idx].as_py() for col in t.column_names}
msgs = json.loads(row["messages"])
roles = [m["role"] for m in msgs]
print(f"{fname}: {t.num_rows:,} rows")
print(f"Row {idx} | qa_id={row['qa_id']} | status={row['status']}")
print(f" gt_seen={row['gt_seen_in_search']} gt_opened={row['gt_opened']} gt_cited={row['gt_cited']}")
print(f" n_searches={row['n_searches']} n_opens={row['n_opens']} n_tool_actions={row['n_tool_actions']}")
print(f" {len(msgs)} turns | Roles: {' -> '.join(roles[:20])}{'...' if len(roles)>20 else ''}\n")
for m in msgs[:8]:
c = m["content"]
if m["role"] == "system":
c = c[:150] + "..."
elif len(c) > 300:
c = c[:300] + "..."
print(f"[{m['role']}]\n{c}\n")
if len(msgs) > 8:
print(f"... ({len(msgs)-8} more turns)")
```
提供机构:
AmanPriyanshu



