alibaba-pai/AgenticQwen-Data

Name: alibaba-pai/AgenticQwen-Data
Creator: alibaba-pai
Published: 2026-03-16 08:50:39
License: 暂无描述

Hugging Face2026-03-16 更新2026-03-21 收录

下载链接：

https://hf-mirror.com/datasets/alibaba-pai/AgenticQwen-Data

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Overview This dataset contains synthetic training examples for agentic RL. Rather than simple prompt-response pairs, each sample is a self-contained agent task with a user goal, hidden scenario context, tool interfaces, operational constraints, adversarial pressure, and verifiable success criteria. The data is generated or expanded by LLMs to create diverse workflows, tool ecosystems, and failure modes. As a result, the dataset is designed not just to train models to respond, but to train them to act as agents: ask for missing information, follow procedures, use tools correctly, resist unsafe shortcuts, and complete tasks through multi-turn interaction. # What’s in the Data Typical examples include: - User prompt: Natural-language user request initiating the workflow (e.g., “switch my hydro service to my new place…”). - System prompt (SOP): Agent instructions describing scope, required checks, allowed/disallowed actions, refusal/transfer conditions, and interaction requirements (e.g., must verify identity first, must ask clarifying questions, user confirmation required before writes). - System prompt (tool definitions): Function/tool schemas provided to the agent (names, descriptions, parameters, required fields, and structured outputs), typically embedded in a ... block. - Task background: Additional scenario context and latent details (IDs, dates, proof tokens, constraints, and “only reveal if asked” fields) used to support multi-turn clarification and to test whether the agent requests missing parameters rather than guessing. - Rubrics / success criteria: Explicit pass/fail conditions tied to verifiable state changes (e.g., which tools must or must not be called; database state must remain unchanged). - User escape / adversarial strategy: How the user may pressure the agent to bypass required checks or violate policy. - Environment states / expected tool returns: A structured tool_return_expected section providing the expected outputs for: - a normal_path (compliant sequence, often starting with a query/validation tool), and sometimes - a hack_path (non-compliant sequence illustrating policy-violating tool use), including expected failure responses from tools when prerequisites are not met.

提供机构：

alibaba-pai

5,000+

优质数据集

54 个

任务类型

进入经典数据集