five

meituan-longcat/VitaBench

收藏
Hugging Face2026-01-27 更新2025-10-18 收录
下载链接:
https://hf-mirror.com/datasets/meituan-longcat/VitaBench
下载链接
链接失效反馈
官方服务:
资源简介:
VitaBench是一个评估agent在多样化交互任务中的表现的挑战性基准,这些任务基于现实世界设置。它包括来自食品配送、店内消费和在线旅游服务的日常应用,为agent提供了至今为止最复杂的生活服务模拟环境,包含66个工具。通过消除领域特定策略的框架,它支持这些场景和工具的灵活组合,产生了100个跨场景任务和300个单场景任务。每个任务都源自多个真实用户请求,并要求agent在多轮对话中跨时间和空间维度进行推理,使用复杂的工具集,主动澄清模糊的指示,并跟踪不断变化的患者意图。

VitaBench is a challenging benchmark that evaluates agents on versatile interactive tasks grounded in real-world settings. It includes daily applications from food delivery, in-store consumption, and online travel services, providing agents with the most complex life-serving simulation environment to date, comprising 66 tools. Through a framework that eliminates domain-specific policies, it supports the flexible composition of these scenarios and tools, yielding 100 cross-scenario tasks and 300 single-scenario tasks. Each task is derived from multiple real user requests and requires agents to reason across temporal and spatial dimensions, utilize complex tool sets, proactively clarify ambiguous instructions, and track shifting user intent throughout multi-turn conversations.
提供机构:
meituan-longcat
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作