AmanPriyanshu/tool-reasoning-sft-RESEARCH-rlvr-env-retrieval-source

Name: AmanPriyanshu/tool-reasoning-sft-RESEARCH-rlvr-env-retrieval-source
Creator: AmanPriyanshu
Published: 2026-03-25 23:11:18
License: 暂无描述

Hugging Face2026-03-25 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/AmanPriyanshu/tool-reasoning-sft-RESEARCH-rlvr-env-retrieval-source

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - text-generation language: - en tags: - reasoning - tool-calling - agentic - multi-turn - retrieval - RAG - search - rlvr size_categories: - 100K<n<1M --- # Tool-Reasoning SFT — RLVR Retrieval Source Trajectories 156,381 multi-turn agentic retrieval trajectories across three document corpora, in a strict reasoning + tool-call format with validated FSM transitions. Each trajectory records a model searching a corpus, opening documents, and citing relevant passages to answer a question. **Author:** [Aman Priyanshu](https://huggingface.co/AmanPriyanshu) ## Source Environments Trajectories were collected against three RLVR retrieval environments from the [FORMAT: Search - Retrieve RLVR](https://huggingface.co/collections/AmanPriyanshu/format-search-retrieve-rlvr) collection: | File | Source Environment | Domain | Rows | |---|---|---|---| | `nvdocs.parquet` | [RLVR-Env-Retrieval-Source-Retrieval-Synthetic-NVDocs-v1](https://huggingface.co/datasets/AmanPriyanshu/RLVR-Env-Retrieval-Source-Retrieval-Synthetic-NVDocs-v1) | NVIDIA technical documentation | 49,696 | | `search-py.parquet` | [RLVR-Env-Retrieval-Source-code-search-net-python](https://huggingface.co/datasets/AmanPriyanshu/RLVR-Env-Retrieval-Source-code-search-net-python) | Python functions (CodeSearchNet) | 54,033 | | `search-js.parquet` | [RLVR-Env-Retrieval-Source-code-search-net-javascript](https://huggingface.co/datasets/AmanPriyanshu/RLVR-Env-Retrieval-Source-code-search-net-javascript) | JavaScript functions (CodeSearchNet) | 52,652 | Each source environment provides 100k QA pairs with ground-truth chunks, random distractors, and hard-negative distractors. Roughly half were successfully converted into trajectories. ## Tools Three retrieval tools available in every trajectory: | Tool | Description | |---|---| | `semantic_search` | Searches documents by semantic similarity, returns top-5 snippets with doc_ids and scores | | `regex_search` | Searches documents using a regex pattern, returns top-5 matches with doc_ids and context | | `open_document` | Opens and reads the full text of a specific document by its doc_id | ## Format ### Message Roles | Role | Content | |---|---| | `system` | Tool-use protocol + tool schemas + retrieval instructions | | `user` | Question to answer using the document corpus | | `reasoning` | `<think>…</think>` — model's step-by-step reasoning | | `tool_call` | `<tool_call>{"name": "...", "arguments": {...}}</tool_call>` — function invocation | | `tool_output` | `<tool_response>…</tool_response>` — tool execution result | | `answer` | `<answer>…</answer>` — final response citing retrieved documents | ### Trajectory Structure ``` system → user → reasoning → [tool_call → tool_output → reasoning →]* answer ``` ## Schema | Column | Type | Description | |---|---|---| | `messages` | string | Converted conversation (JSON list of `{role, content}`) | | `qa_id` | string | Unique question ID (matches source environment) | | `status` | string | `completed` or `completed_forced` | | `gt_seen_in_search` | bool | Ground-truth document appeared in search results | | `gt_opened` | bool | Ground-truth document was opened | | `gt_cited` | bool | Ground-truth document was cited in the answer | | `cited_doc_ids` | string | JSON list of all cited document IDs | | `n_searches` | int | Number of `semantic_search` + `regex_search` calls | | `n_opens` | int | Number of `open_document` calls | | `n_tool_actions` | int | Total tool calls in the trajectory | ## Statistics | Metric | nvdocs | search-py | search-js | |---|---|---|---| | Rows | 49,696 | 54,033 | 52,652 | | Completed | 48,175 (96.9%) | 50,850 (94.1%) | 46,772 (88.8%) | | Completed (forced) | 1,521 (3.1%) | 3,183 (5.9%) | 5,880 (11.2%) | | GT seen in search | 47,236 (95.0%) | 49,111 (90.9%) | 38,242 (72.6%) | | GT opened | 43,696 (87.9%) | 47,183 (87.3%) | 32,138 (61.0%) | | GT cited | 38,983 (78.4%) | 42,868 (79.3%) | 27,028 (51.3%) | | Avg searches/row | 16.2 | 19.3 | 21.3 | | Avg opens/row | 7.5 | 4.7 | 4.5 | | Avg tool actions/row | 23.9 | 24.2 | 25.8 | ## Generation Trajectories were collected using `gpt-oss-120b` served via vLLM on 8×H100 GPUs. The model was given access to three retrieval tools and asked to find and cite the relevant document(s) to answer each question. Trajectories were then corrected via replay to fix tool output mismatches, and converted into the canonical 6-role FSM format with validated transitions. ## Usage ```py import json, random from huggingface_hub import hf_hub_download REPO = "AmanPriyanshu/tool-reasoning-sft-RESEARCH-rlvr-env-retrieval-source" for fname in ["nvdocs.parquet", "search-py.parquet", "search-js.parquet"]: print(f"\n{'='*60}") print(f"Downloading {fname}...") local = hf_hub_download(REPO, fname, repo_type="dataset") import pyarrow.parquet as pq t = pq.read_table(local) idx = random.randint(0, t.num_rows - 1) row = {col: t.column(col)[idx].as_py() for col in t.column_names} msgs = json.loads(row["messages"]) roles = [m["role"] for m in msgs] print(f"{fname}: {t.num_rows:,} rows") print(f"Row {idx} | qa_id={row['qa_id']} | status={row['status']}") print(f" gt_seen={row['gt_seen_in_search']} gt_opened={row['gt_opened']} gt_cited={row['gt_cited']}") print(f" n_searches={row['n_searches']} n_opens={row['n_opens']} n_tool_actions={row['n_tool_actions']}") print(f" {len(msgs)} turns | Roles: {' -> '.join(roles[:20])}{'...' if len(roles)>20 else ''}\n") for m in msgs[:8]: c = m["content"] if m["role"] == "system": c = c[:150] + "..." elif len(c) > 300: c = c[:300] + "..." print(f"[{m['role']}]\n{c}\n") if len(msgs) > 8: print(f"... ({len(msgs)-8} more turns)") ```

提供机构：

AmanPriyanshu

5,000+

优质数据集

54 个

任务类型

进入经典数据集