MobileWorld
收藏魔搭社区2026-01-09 更新2026-01-10 收录
下载链接:
https://modelscope.cn/datasets/Tongyi-MAI/MobileWorld
下载链接
链接失效反馈官方服务:
资源简介:
# MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive and MCP-Augmented Environments
Mobile World is a substantially more challenging mobile-use benchmark designed to better reflect real-world mobile usage. It comprises 201 tasks across 20 applications, featuring long-horizon, cross-app tasks, and novel task categories including agent-user interaction and MCP-augmented tasks.
The difficulty of Mobile World is twofold:
- Long-horizon, cross-application tasks. Mobile World tasks require on average 27.8 completion steps, nearly twice as many as 14.3 steps required in AndroidWorld. Moreover, 62.2% of tasks involve cross-application workflows compared to only 9.5% in AndroidWorld.
- Novel task categories. Mobile World extends beyond standard GUI manipulation by introducing (1) agent-user interaction tasks (22.4%) that evaluate an agent's ability to handle ambiguous instructions through collaborative dialogue, and (2) MCP-augmented tasks (19.9%) that require hybrid-usage of GUI navigation and external tool invocations via the Model Context Protocol.

The system architecture of Mobile World consists of two main components. Left: the host machine is where GUI agents receive task instructions and optionally interact with users for clarification, then choose between GUI actions or MCP tool calls to complete tasks. Right: the docker environment contains an isolated Android ecosystem with emulators, self-hosted app backends, and an evaluator that verifies task completion through text matching, backend database, local storage, and app callbacks.

In this dataset, we provide an overview of all task goals in our benchmark.
- Github: https://github.com/Tongyi-MAI/MobileWorld
- Project Page: https://tongyi-mai.github.io/MobileWorld/
- arxiv: https://arxiv.org/abs/2512.19432
# MobileWorld: 面向智能体-用户交互与MCP(Model Context Protocol)增强环境的自主移动智能体基准测试
Mobile World 是一款极具挑战性的移动端使用基准测试,旨在更精准地还原真实世界的移动端使用场景。该基准涵盖 20 个应用程序下的 201 项任务,包含长时序跨应用任务,以及智能体-用户交互、MCP 增强任务等全新任务类别。
Mobile World 的挑战主要体现在两大维度:
- 长时序跨应用任务。Mobile World 的单任务平均完成步骤为 27.8 步,几乎是 AndroidWorld 任务平均步骤(14.3 步)的两倍。此外,62.2% 的任务涉及跨应用工作流,而 AndroidWorld 中该比例仅为 9.5%。
- 全新任务类别。Mobile World 突破了标准图形用户界面(Graphical User Interface,GUI)操作的范畴,新增两类任务:(1) 智能体-用户交互任务(占比 22.4%),用于评估智能体通过协作对话处理模糊指令的能力;(2) MCP 增强任务(占比 19.9%),要求智能体混合使用 GUI 导航与通过 MCP 调用外部工具的能力。

Mobile World 的系统架构包含两大核心组件。左侧为主机端:GUI 智能体接收任务指令,可按需与用户交互以澄清需求,随后选择 GUI 操作或 MCP 工具调用以完成任务。右侧为 Docker 环境:内置隔离的安卓生态系统,包含模拟器、自托管应用后端,以及通过文本匹配、后端数据库、本地存储与应用回调验证任务完成情况的评估模块。

本数据集提供了该基准中所有任务目标的概览。
- GitHub 仓库:https://github.com/Tongyi-MAI/MobileWorld
- 项目主页:https://tongyi-mai.github.io/MobileWorld/
- arXiv 预印本:https://arxiv.org/abs/2512.19432
提供机构:
maas
创建时间:
2026-01-04



