Spider2-V
收藏arXiv2025-09-30 收录
下载链接:
https://spider2-v.github.io
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个专注于专业数据科学和工程工作流程的多模态代理基准测试,包含了494个真实世界中的任务,这些任务在真实的计算机环境中进行,并整合了20个企业级专业应用程序。这些任务源自真实世界用例,旨在评估多模态代理在自动化数据工作流程中的表现。该数据集的规模涵盖了20个专业应用程序中的494个任务,其任务是评估多模态代理在编写代码和管理企业数据软件系统图形用户界面中执行数据相关任务的能力。
This dataset is a multimodal agent benchmark focused on professional data science and engineering workflows. It includes 494 real-world tasks executed in realistic computing environments, integrating 20 enterprise-grade professional applications. These tasks are sourced from real-world use cases, with the goal of evaluating multimodal agents' performance in automated data workflows. Covering 494 tasks across these 20 professional applications, this benchmark specifically assesses the ability of multimodal agents to write code and complete data-related tasks by managing the graphical user interfaces (GUIs) of enterprise data software systems.



