five

CAGUI

收藏
魔搭社区2026-05-07 更新2025-05-17 收录
下载链接:
https://modelscope.cn/datasets/OpenBMB/CAGUI
下载链接
链接失效反馈
官方服务:
资源简介:
# CAGUI: **C**hinese **A**ndroid **GUI** Benchmark A real-world Chinese Android GUI benchmark designed to evaluate **GUI agent models** on two complementary capabilities: * **Grounding** – understanding individual GUI components and linking them to semantics. * **Agent** – planning and executing multi-step actions to complete user goals on Chinese Android apps. See [AgentCPM-GUI](https://github.com/OpenBMB/AgentCPM-GUI) for more details. --- ## 🌟 Key Features | Aspect | Grounding | Agent | |--------|-----------|-------| | **Objective** | GUI widgets grounding / OCR text | Follow natural-language instructions to operate an app | | **Data** | 2 × JSONL files (`cap.jsonl`, `ocr.jsonl`) + screenshots | Per-episode JSON + step-level screenshots | | **Actions** | _n/a_ | Tap, long-tap, text-input, scroll, etc. (`result_action_type`, see [here](https://github.com/OpenBMB/AgentCPM-GUI/blob/main/eval/utils/action_type.py)) | --- ## 🗂️ Repository Structure ``` CAGUI ├─ CAGUI_agent │ └─ domestic/ │ └─ <episode_id>/ │ ├─ <episode_id>.json # episode file │ ├─ <episode_id>_0.jpeg # step-0 screenshot │ ├─ <episode_id>_1.jpeg │ └─ ... └─ CAGUI_grounding ├─ code/ │ ├─ cap.jsonl # function to point & text to point │ └─ ocr.jsonl # bbox to text └─ images/ ├─ cap/ │ ├─ 0.jpeg │ └─ ... └─ ocr/ ├─ 0.jpeg └─ ... ```` --- ## 📑 Data Format ### 1. Agent episodes (`CAGUI_agent/domestic/<episode_id>/<episode_id>.json`) Each file is a **list of steps**: | Field | Type | Description | |-------|------|-------------| | `episode_id` | str | Unique id | | `episode_length` | int | Total steps | | `step_id` | int | Step index (0-based) | | `instruction` | str | user goal | | `image_path` | str | Relative path to screenshot | | `image_width` / `image_height` | int | Raw resolution | | `ui_positions` | str (JSON list) | Normalised \[[y, x, h, w], …] UI element boxes | | `result_action_type` | int | Action code, see [here](https://github.com/OpenBMB/AgentCPM-GUI/blob/main/eval/utils/action_type.py) | | `result_action_text` | str | Text typed (if any) | | `result_touch_yx` / `result_lift_yx` | str | Normalised touch coords, `[-1,-1]` if no touch | | `duration` | float \| null | Action time (s) | ### 2. Grounding annotations (`cap.jsonl`, `ocr.jsonl`) One JSON object per line: | Field | Example | Description | |-------|---------|-------------| | `task` | `"bbox2function"` / `"bbox2text"` | Sub-task type | | `image` | `"grounding_eval/dataset/images/0.jpeg"` | Screenshot path | | `id` | `0` | Unique int id | | `abs_position` | `"<x1, y1, x2, y2>"` | Pixel-level bbox | | `rel_position` | `"<x1, y1, x2, y2>"` | Normalised bbox | | `text` | `"UI元素是一个菜单按钮…"` | Target description / OCR string | --- ## 📜 License CAGUI is released under the **CC-BY-NC 4.0** license for **non-commercial research**. Screenshots originate from publicly available Chinese apps and are used under fair-use for research purposes only. Remove them if local regulations require. --- ## ✏️ Citation This benchmark is used for evaluating [AgentCPM-GUI](https://github.com/OpenBMB/AgentCPM-GUI). If our model and benchmark are useful for your research, please cite: ```bibtex @article{zhang2025agentcpmgui, title={Agent{CPM}-{GUI}: Building Mobile-Use Agents with Reinforcement Fine-Tuning}, author={Zhong Zhang and Yaxi Lu and Yikun Fu and Yupeng Huo and Shenzhi Yang and Yesai Wu and Han Si and Xin Cong and Haotian Chen and Yankai Lin and Jie Xie and Wei Zhou and Wang Xu and Yuanheng Zhang and Zhou Su and Zhongwu Zhai and Xiaoming Liu and Yudong Mei and Jianming Xu and Hongyan Tian and Chongyi Wang and Chi Chen and Yuan Yao and Zhiyuan Liu and Maosong Sun}, year={2025}, journal={arXiv preprint arXiv:2506.01391}, } ``` ---

# CAGUI: **C**hinese **A**ndroid **GUI** Benchmark 本数据集为面向真实场景的中文安卓图形用户界面(Graphical User Interface,GUI)基准测试集,旨在从两项互补能力维度对GUI智能体模型进行评测: * **语义锚定(Grounding)**:理解单个GUI组件并将其与语义关联起来。 * **智能体执行(Agent)**:规划并执行多步操作,以在中文安卓应用中完成用户目标。 更多细节可参见 [AgentCPM-GUI](https://github.com/OpenBMB/AgentCPM-GUI)。 --- ## 🌟 核心特性 | 维度 | 语义锚定 | 智能体执行 | |--------|-----------|-------| | **评测目标** | GUI组件锚定 / 光学字符识别(Optical Character Recognition,OCR)文本 | 遵循自然语言指令操作应用 | | **数据集构成** | 2个JSONL格式文件(`cap.jsonl`、`ocr.jsonl`)+ 截图文件 | 每轮任务JSON文件 + 步骤级截图 | | **支持操作** | 无 | 点击、长按、文本输入、滑动等(操作编码见`result_action_type`,详见[此处](https://github.com/OpenBMB/AgentCPM-GUI/blob/main/eval/utils/action_type.py)) | --- ## 🗂️ 仓库目录结构 CAGUI ├─ CAGUI_agent │ └─ domestic/ │ └─ <episode_id>/ │ ├─ <episode_id>.json # 任务轮次文件 │ ├─ <episode_id>_0.jpeg # 第0步截图 │ ├─ <episode_id>_1.jpeg │ └─ ... └─ CAGUI_grounding ├─ code/ │ ├─ cap.jsonl # 组件指向与文本映射工具 │ └─ ocr.jsonl # 边界框到文本映射工具 └─ images/ ├─ cap/ │ ├─ 0.jpeg │ └─ ... └─ ocr/ ├─ 0.jpeg └─ ... --- ## 📑 数据格式 ### 1. 智能体任务轮次数据(`CAGUI_agent/domestic/<episode_id>/<episode_id>.json`) 每个文件为**步骤列表**: | 字段名 | 数据类型 | 字段说明 | |-------|------|-------------| | `episode_id` | 字符串 | 唯一标识符 | | `episode_length` | 整数 | 总步骤数 | | `step_id` | 整数 | 步骤索引(从0开始计数) | | `instruction` | 字符串 | 用户目标指令 | | `image_path` | 字符串 | 截图文件相对路径 | | `image_width` / `image_height` | 整数 | 原始图像分辨率 | | `ui_positions` | 字符串(JSON列表格式) | 归一化后的UI元素边界框,格式为`[[y, x, h, w], …]` | | `result_action_type` | 整数 | 操作编码,详见[此处](https://github.com/OpenBMB/AgentCPM-GUI/blob/main/eval/utils/action_type.py) | | `result_action_text` | 字符串 | 输入的文本(如适用) | | `result_touch_yx` / `result_lift_yx` | 字符串 | 归一化后的触摸坐标,无触摸时为`[-1,-1]` | | `duration` | 浮点数或空值 | 操作持续时长(单位:秒) | ### 2. 语义锚定标注文件(`cap.jsonl`、`ocr.jsonl`) 每行对应一个JSON对象: | 字段名 | 示例 | 字段说明 | |-------|---------|-------------| | `task` | `"bbox2function"` / `"bbox2text"` | 子任务类型 | | `image` | `"grounding_eval/dataset/images/0.jpeg"` | 截图文件路径 | | `id` | `0` | 唯一整数标识符 | | `abs_position` | `"<x1, y1, x2, y2>"` | 像素级边界框 | | `rel_position` | `"<x1, y1, x2, y2>"` | 归一化边界框 | | `text` | `"UI元素是一个菜单按钮…"` | 目标描述文本 / 光学字符识别(Optical Character Recognition,OCR)识别结果 | --- ## 📜 授权协议 CAGUI采用**CC-BY-NC 4.0**协议发布,仅用于**非商业性研究**。截图源自公开可用的中文应用,仅基于合理使用原则用于研究场景;若当地法规有要求,请自行删除相关截图。 --- ## ✏️ 引用方式 本基准数据集用于评测[AgentCPM-GUI](https://github.com/OpenBMB/AgentCPM-GUI)。若您的研究中用到了本数据集,请引用如下文献: bibtex @article{zhang2025agentcpmgui, title={Agent{CPM}-{GUI}: Building Mobile-Use Agents with Reinforcement Fine-Tuning}, author={Zhong Zhang and Yaxi Lu and Yikun Fu and Yupeng Huo and Shenzhi Yang and Yesai Wu and Han Si and Xin Cong and Haotian Chen and Yankai Lin and Jie Xie and Wei Zhou and Wang Xu and Yuanheng Zhang and Zhou Su and Zhongwu Zhai and Xiaoming Liu and Yudong Mei and Jianming Xu and Hongyan Tian and Chongyi Wang and Chi Chen and Yuan Yao and Zhiyuan Liu and Maosong Sun}, year={2025}, journal={arXiv preprint arXiv:2506.01391}, }
提供机构:
maas
创建时间:
2025-05-15
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
CAGUI是一个中文Android GUI基准测试数据集,用于评估GUI代理模型的基础理解和多步操作能力,包含丰富的JSONL文件和截图数据,适用于非商业研究。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作