sunorme/astra_rlvr

Name: sunorme/astra_rlvr
Creator: sunorme
Published: 2026-03-22 03:15:34
License: 暂无描述

Hugging Face2026-03-22 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/sunorme/astra_rlvr

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: apache-2.0 task_categories: - reinforcement-learning tags: - agents - tool-use - function-calling - synthetic-data - rlvr pretty_name: ASTRA RLVR Environments Dataset (Environment Synthesis) --- [![GitHub](https://img.shields.io/badge/GitHub-Astra-blue?logo=github)](https://github.com/YOUR_ORG/Astra) [![Blog](https://img.shields.io/badge/Blog-Project%20Page-orange?logo=github)](https://kurisu0306.github.io/astra.github.io/) [![HuggingFace](https://img.shields.io/badge/🤗%20HuggingFace-Datasets-yellow)](https://huggingface.co/datasets/YOUR_ORG) [![HuggingFace](https://img.shields.io/badge/🤗%20HuggingFace-Models-yellow)](https://huggingface.co/YOUR_ORG) [![Paper](https://img.shields.io/badge/📄%20Paper-Coming%20Soon-lightgrey)](https://arxiv.org) # ASTRA RLVR Dataset **RLVR Dataset** released by **ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas**. The RLVR data is designed for training/evaluating **tool use + multi-step reasoning** with **verifiable rewards** in executable environments. ## 1. Dataset Overview - **RLVR environments (Environment Synthesis)**: starting from QA pairs, we automatically decompose a main question into sub-questions and generate an **executable tool environment** (tool documentation / call statements / code) for sub-questions that require tools. We then perform rule-based and sandbox-execution verification to filter for verifiable samples. ## 2. Data Construction Pipeline ![Environment Synthesis Pipeline](assets/env.png) - **Start from QA / knowledge sources:** Generate a main question and construct a multi-hop decomposition trajectory. - **Decide tool needs + score verification:** For each sub-question, determine whether tools are required and compute multi-dimensional verification scores (often using a P90 threshold to select high-quality samples). - **Auto-generate and execute for tool-required sub-questions:** Automatically generate tool documentation, call statements, and executable code, then run sandbox execution for verifiable filtering. - **Cluster and merge similar tools:** Group tools with similar intents and merge them where appropriate. Re-run sandbox validation to ensure executability and verifiability. ## 3. Data Formats and Field Definitions Each sample is a JSON object. Common top-level fields include: - `prompt`: the dialog prompt (usually two messages: system + user) - `synthetic_env_tool_schema`: tool schema (string; JSON-serialized OpenAI tools/function schema list) - `synthetic_env_tool_dict`: tool implementation (string; JSON-serialized dict) - After deserialization: `{tool_name: python_code_string, ...}` - `synthetic_env_sub_qa_dict_for_verify`: verification assertions (string; JSON-serialized dict) - After deserialization: `{tool_name: [expected_substrings...], ...}` used for sandbox validation (e.g., check whether `tool_call_ans` contains expected answer snippets) - `synthetic_env_sub_qa_dict`: a simplified version of sub-question answers/constraints (similar to the verify version; varies by release) - Others: meta fields such as `ability`, `agent_name`, `extra_info`, etc. > Note: the field `synthetic_env_sub_qa_reward` may be empty in some versions (kept for future extensions such as process-level rewards / rule signals). ## 4. Usage (HuggingFace Datasets) ```python import json from datasets import load_dataset ds = load_dataset("TODO/astra_rlvr", "rlvr_envs", split="train") ex = ds[0] tools_schema = json.loads(ex["synthetic_env_tool_schema"]) # list[dict] tool_code_map = json.loads(ex["synthetic_env_tool_dict"]) # dict[str, str] verify_map = json.loads(ex["synthetic_env_sub_qa_dict_for_verify"]) # dict[str, list[str]] ``` ## 5. Disclaimer - **Non-endorsement & liability disclaimer**: The dataset content is provided for research and educational purposes only. It does not reflect the views, interests, beliefs, or endorsements of any individual or organization, and should not be interpreted as making claims about any group. The project maintainers disclaim responsibility for any direct or indirect harm or damages arising from the use or misuse of the dataset or related resources. - **Partial release due to policy constraints**: Due to company policies and compliance requirements, only a subset of the full dataset is publicly released, which may limit coverage and representativeness. ## 6. Citation ```bibtex @misc{astra2026, title={ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas}, author={Beike Language and Intelligence (BLI)}, year={2026} } ```

提供机构：

sunorme

5,000+

优质数据集

54 个

任务类型

进入经典数据集