Mobile-Bench-v2

arXiv2025-09-30 收录

下载链接：

https://huggingface.co/datasets/xwk123/MobileBench-v2

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集名为Mobile-Bench-v2，旨在通过多种任务划分来评估基于视觉语言模型（VLM）的移动代理的表现，这些任务划分涵盖了常见的、含噪声的以及模糊的指令，以检验它们在动态环境中的性能。数据集还包括特定的划分，以评估代理在处理噪声、模糊指令和主动交互方面的能力，同时引入了成功率、步骤效率、准确性和类型匹配等评价指标。此外，该数据集根据Mobile3M的筛选和注释进行了规模划分，包括Random-800的子集划分。其任务重点在于评估移动代理在处理不同类型指令下的图形用户界面（GUI）任务的多路径表现。

This dataset, named Mobile-Bench-v2, is designed to evaluate the performance of vision-language model (VLM)-based mobile agents via diverse task partitions. These partitions cover common, noisy and ambiguous instructions, to test their performance in dynamic environments. The dataset also features specialized partitions for assessing agents' abilities in handling noisy, ambiguous instructions and proactive interaction, and introduces evaluation metrics including success rate, step efficiency, accuracy and type matching. Additionally, the dataset is scaled based on the filtering and annotation work of Mobile3M, with a subset partition named Random-800 included. Its core task focus is to evaluate the multi-path performance of mobile agents when processing graphical user interface (GUI) tasks with various types of instructions.

搜集汇总

数据集介绍

背景与挑战

背景概述

Mobile-Bench-v2是一个多子集数据集，包含ambiguous category、common-simple、common-complex等任务类型，以及Noisy数据和aitz contamination noisy split。数据集需与Mobile3M配合使用，提供样本和完整版本供参考。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集