five

MobileWorld

收藏
魔搭社区2026-01-09 更新2026-01-10 收录
下载链接:
https://modelscope.cn/datasets/Tongyi-MAI/MobileWorld
下载链接
链接失效反馈
官方服务:
资源简介:
# MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive and MCP-Augmented Environments Mobile World is a substantially more challenging mobile-use benchmark designed to better reflect real-world mobile usage. It comprises 201 tasks across 20 applications, featuring long-horizon, cross-app tasks, and novel task categories including agent-user interaction and MCP-augmented tasks. The difficulty of Mobile World is twofold: - Long-horizon, cross-application tasks. Mobile World tasks require on average 27.8 completion steps, nearly twice as many as 14.3 steps required in AndroidWorld. Moreover, 62.2% of tasks involve cross-application workflows compared to only 9.5% in AndroidWorld. - Novel task categories. Mobile World extends beyond standard GUI manipulation by introducing (1) agent-user interaction tasks (22.4%) that evaluate an agent's ability to handle ambiguous instructions through collaborative dialogue, and (2) MCP-augmented tasks (19.9%) that require hybrid-usage of GUI navigation and external tool invocations via the Model Context Protocol. ![image](https://cdn-uploads.huggingface.co/production/uploads/63525c3a6cfb8f1498127a34/ndnkD2k9sEJrJrrYLzq-B.png) The system architecture of Mobile World consists of two main components. Left: the host machine is where GUI agents receive task instructions and optionally interact with users for clarification, then choose between GUI actions or MCP tool calls to complete tasks. Right: the docker environment contains an isolated Android ecosystem with emulators, self-hosted app backends, and an evaluator that verifies task completion through text matching, backend database, local storage, and app callbacks. ![image](https://cdn-uploads.huggingface.co/production/uploads/63525c3a6cfb8f1498127a34/mhurYcCzPg3Rv_MRrpjKy.png) In this dataset, we provide an overview of all task goals in our benchmark. - Github: https://github.com/Tongyi-MAI/MobileWorld - Project Page: https://tongyi-mai.github.io/MobileWorld/ - arxiv: https://arxiv.org/abs/2512.19432

# MobileWorld: 面向智能体-用户交互与MCP(Model Context Protocol)增强环境的自主移动智能体基准测试 Mobile World 是一款极具挑战性的移动端使用基准测试,旨在更精准地还原真实世界的移动端使用场景。该基准涵盖 20 个应用程序下的 201 项任务,包含长时序跨应用任务,以及智能体-用户交互、MCP 增强任务等全新任务类别。 Mobile World 的挑战主要体现在两大维度: - 长时序跨应用任务。Mobile World 的单任务平均完成步骤为 27.8 步,几乎是 AndroidWorld 任务平均步骤(14.3 步)的两倍。此外,62.2% 的任务涉及跨应用工作流,而 AndroidWorld 中该比例仅为 9.5%。 - 全新任务类别。Mobile World 突破了标准图形用户界面(Graphical User Interface,GUI)操作的范畴,新增两类任务:(1) 智能体-用户交互任务(占比 22.4%),用于评估智能体通过协作对话处理模糊指令的能力;(2) MCP 增强任务(占比 19.9%),要求智能体混合使用 GUI 导航与通过 MCP 调用外部工具的能力。 ![image](https://cdn-uploads.huggingface.co/production/uploads/63525c3a6cfb8f1498127a34/ndnkD2k9sEJrJrrYLzq-B.png) Mobile World 的系统架构包含两大核心组件。左侧为主机端:GUI 智能体接收任务指令,可按需与用户交互以澄清需求,随后选择 GUI 操作或 MCP 工具调用以完成任务。右侧为 Docker 环境:内置隔离的安卓生态系统,包含模拟器、自托管应用后端,以及通过文本匹配、后端数据库、本地存储与应用回调验证任务完成情况的评估模块。 ![image](https://cdn-uploads.huggingface.co/production/uploads/63525c3a6cfb8f1498127a34/mhurYcCzPg3Rv_MRrpjKy.png) 本数据集提供了该基准中所有任务目标的概览。 - GitHub 仓库:https://github.com/Tongyi-MAI/MobileWorld - 项目主页:https://tongyi-mai.github.io/MobileWorld/ - arXiv 预印本:https://arxiv.org/abs/2512.19432
提供机构:
maas
创建时间:
2026-01-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作