vibrantlabsai/tau2-infinity

Name: vibrantlabsai/tau2-infinity
Creator: vibrantlabsai
Published: 2026-04-21 18:10:42
License: 暂无描述

Hugging Face2026-04-21 更新2026-05-10 收录

下载链接：

https://hf-mirror.com/datasets/vibrantlabsai/tau2-infinity

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: apache-2.0 task_categories: - text-generation tags: - benchmark - tool-use - agent - function-calling - airline size_categories: - n<1K --- # tau2-infinity An adaptive benchmark for evaluating LLM tool-use agents on airline customer service tasks. Generated using EnvScaler by VibrantLabs. ## Overview Each task requires an agent to transform an initial database state **S_0** into a golden final state **S*** by executing a sequence of tool calls (flight searches, bookings, cancellations, updates, etc.). Tasks were adaptively generated to target specific difficulty levels against a calibration model. | Property | Value | |----------|-------| | Number of tasks | 13 | | Target pass rate | [0.2, 0.6] | | Achieved avg pass rate | 0.354 | | Calibration model | `fireworks_ai/accounts/vibrantlabs/deployments/bv8h7e5g` | | Evaluation runs per task | 5 | | Total iterations to collect | 50 | | Collection rate | 26.0% | ## Dataset Schema | Column | Type | Description | |--------|------|-------------| | `task_id` | string | Unique task identifier | | `task_description` | string | Natural language task the agent must complete | | `tools` | JSON string | Tool specifications available to the agent | | `database` | JSON string | Initial database state (S_0) | | `golden_trajectory` | JSON string | Resolved DAG with oracle tool calls and expected outputs | | `pass_rate` | float | Pass rate achieved by the calibration model (0.0 - 1.0) | ## Tasks | Task ID | Pass Rate | Failure Mode | |---------|-----------|-------------| | 010 | 0.600 | | | 015 | 0.200 | | | 018 | 0.200 | | | 019 | 0.400 | | | 027 | 0.200 | | | 031 | 0.400 | | | 034 | 0.200 | | | 039 | 0.400 | | | 040 | 0.600 | | | 041 | 0.200 | | | 042 | 0.200 | | | 044 | 0.600 | | | 050 | 0.600 | | ## Failure Mode Analysis ## Usage ```python from datasets import load_dataset ds = load_dataset("vibrantlabsai/tau2-infinity", split="test") for task in ds: print(task["task_id"], task["task_description"][:100]) # Parse structured fields import json tools = json.loads(task["tools"]) database = json.loads(task["database"]) golden = json.loads(task["golden_trajectory"]) ``` ## License Apache 2.0

提供机构：

vibrantlabsai

5,000+

优质数据集

54 个

任务类型

进入经典数据集