five

vibrantlabsai/tau2-infinity

收藏
Hugging Face2026-04-21 更新2026-05-10 收录
下载链接:
https://hf-mirror.com/datasets/vibrantlabsai/tau2-infinity
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: apache-2.0 task_categories: - text-generation tags: - benchmark - tool-use - agent - function-calling - airline size_categories: - n<1K --- # tau2-infinity An adaptive benchmark for evaluating LLM tool-use agents on airline customer service tasks. Generated using EnvScaler by VibrantLabs. ## Overview Each task requires an agent to transform an initial database state **S_0** into a golden final state **S*** by executing a sequence of tool calls (flight searches, bookings, cancellations, updates, etc.). Tasks were adaptively generated to target specific difficulty levels against a calibration model. | Property | Value | |----------|-------| | Number of tasks | 13 | | Target pass rate | [0.2, 0.6] | | Achieved avg pass rate | 0.354 | | Calibration model | `fireworks_ai/accounts/vibrantlabs/deployments/bv8h7e5g` | | Evaluation runs per task | 5 | | Total iterations to collect | 50 | | Collection rate | 26.0% | ## Dataset Schema | Column | Type | Description | |--------|------|-------------| | `task_id` | string | Unique task identifier | | `task_description` | string | Natural language task the agent must complete | | `tools` | JSON string | Tool specifications available to the agent | | `database` | JSON string | Initial database state (S_0) | | `golden_trajectory` | JSON string | Resolved DAG with oracle tool calls and expected outputs | | `pass_rate` | float | Pass rate achieved by the calibration model (0.0 - 1.0) | ## Tasks | Task ID | Pass Rate | Failure Mode | |---------|-----------|-------------| | 010 | 0.600 | | | 015 | 0.200 | | | 018 | 0.200 | | | 019 | 0.400 | | | 027 | 0.200 | | | 031 | 0.400 | | | 034 | 0.200 | | | 039 | 0.400 | | | 040 | 0.600 | | | 041 | 0.200 | | | 042 | 0.200 | | | 044 | 0.600 | | | 050 | 0.600 | | ## Failure Mode Analysis ## Usage ```python from datasets import load_dataset ds = load_dataset("vibrantlabsai/tau2-infinity", split="test") for task in ds: print(task["task_id"], task["task_description"][:100]) # Parse structured fields import json tools = json.loads(task["tools"]) database = json.loads(task["database"]) golden = json.loads(task["golden_trajectory"]) ``` ## License Apache 2.0
提供机构:
vibrantlabsai
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作