MobileWorld

Name: MobileWorld
Creator: maas
Published: 2026-01-09 15:03:39
License: 暂无描述

魔搭社区2026-01-09 更新2026-01-10 收录

下载链接：

https://modelscope.cn/datasets/Tongyi-MAI/MobileWorld

下载链接

链接失效反馈

官方服务：

资源简介：

# MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive and MCP-Augmented Environments Mobile World is a substantially more challenging mobile-use benchmark designed to better reflect real-world mobile usage. It comprises 201 tasks across 20 applications, featuring long-horizon, cross-app tasks, and novel task categories including agent-user interaction and MCP-augmented tasks. The difficulty of Mobile World is twofold: - Long-horizon, cross-application tasks. Mobile World tasks require on average 27.8 completion steps, nearly twice as many as 14.3 steps required in AndroidWorld. Moreover, 62.2% of tasks involve cross-application workflows compared to only 9.5% in AndroidWorld. - Novel task categories. Mobile World extends beyond standard GUI manipulation by introducing (1) agent-user interaction tasks (22.4%) that evaluate an agent's ability to handle ambiguous instructions through collaborative dialogue, and (2) MCP-augmented tasks (19.9%) that require hybrid-usage of GUI navigation and external tool invocations via the Model Context Protocol. ![image](https://cdn-uploads.huggingface.co/production/uploads/63525c3a6cfb8f1498127a34/ndnkD2k9sEJrJrrYLzq-B.png) The system architecture of Mobile World consists of two main components. Left: the host machine is where GUI agents receive task instructions and optionally interact with users for clarification, then choose between GUI actions or MCP tool calls to complete tasks. Right: the docker environment contains an isolated Android ecosystem with emulators, self-hosted app backends, and an evaluator that verifies task completion through text matching, backend database, local storage, and app callbacks. ![image](https://cdn-uploads.huggingface.co/production/uploads/63525c3a6cfb8f1498127a34/mhurYcCzPg3Rv_MRrpjKy.png) In this dataset, we provide an overview of all task goals in our benchmark. - Github: https://github.com/Tongyi-MAI/MobileWorld - Project Page: https://tongyi-mai.github.io/MobileWorld/ - arxiv: https://arxiv.org/abs/2512.19432

# MobileWorld: 面向智能体-用户交互与MCP(Model Context Protocol)增强环境的自主移动智能体基准测试 Mobile World 是一款极具挑战性的移动端使用基准测试，旨在更精准地还原真实世界的移动端使用场景。该基准涵盖 20 个应用程序下的 201 项任务，包含长时序跨应用任务，以及智能体-用户交互、MCP 增强任务等全新任务类别。 Mobile World 的挑战主要体现在两大维度： - 长时序跨应用任务。Mobile World 的单任务平均完成步骤为 27.8 步，几乎是 AndroidWorld 任务平均步骤（14.3 步）的两倍。此外，62.2% 的任务涉及跨应用工作流，而 AndroidWorld 中该比例仅为 9.5%。 - 全新任务类别。Mobile World 突破了标准图形用户界面（Graphical User Interface，GUI）操作的范畴，新增两类任务：(1) 智能体-用户交互任务（占比 22.4%），用于评估智能体通过协作对话处理模糊指令的能力；(2) MCP 增强任务（占比 19.9%），要求智能体混合使用 GUI 导航与通过 MCP 调用外部工具的能力。 ![image](https://cdn-uploads.huggingface.co/production/uploads/63525c3a6cfb8f1498127a34/ndnkD2k9sEJrJrrYLzq-B.png) Mobile World 的系统架构包含两大核心组件。左侧为主机端：GUI 智能体接收任务指令，可按需与用户交互以澄清需求，随后选择 GUI 操作或 MCP 工具调用以完成任务。右侧为 Docker 环境：内置隔离的安卓生态系统，包含模拟器、自托管应用后端，以及通过文本匹配、后端数据库、本地存储与应用回调验证任务完成情况的评估模块。 ![image](https://cdn-uploads.huggingface.co/production/uploads/63525c3a6cfb8f1498127a34/mhurYcCzPg3Rv_MRrpjKy.png) 本数据集提供了该基准中所有任务目标的概览。 - GitHub 仓库：https://github.com/Tongyi-MAI/MobileWorld - 项目主页：https://tongyi-mai.github.io/MobileWorld/ - arXiv 预印本：https://arxiv.org/abs/2512.19432

提供机构：

maas

创建时间：

2026-01-04

5,000+

优质数据集

54 个

任务类型

进入经典数据集