five

GUIMid

收藏
魔搭社区2026-01-06 更新2025-04-26 收录
下载链接:
https://modelscope.cn/datasets/hkust-nlp/GUIMid
下载链接
链接失效反馈
官方服务:
资源简介:
<div align="center"> <h1> Breaking the Data Barrier – Building GUI Agents Through Task Generalization </h1> </div> <div align="center"> [🐙 GitHub](https://github.com/hkust-nlp/GUIMid) | 📝 [Paper](https://arxiv.org/abs/2504.10127) | [🤗 Mid-training Data](https://huggingface.co/datasets/hkust-nlp/GUIMid/) | [🤗 Post-Training Data](https://huggingface.co/datasets/hkust-nlp/GUIMid/blob/main/GUI_trajectory.json) </div> <div align="center"> <img src="https://cdn-uploads.huggingface.co/production/uploads/63b76e716fc56e43c3c22ca8/6fepPX_FZRCiqHgypsBMD.png" width="60%" /> </div> ## TODO List - [ ] Report and release the GUIMid with larger size and more domains (10th May expecetd) ## 1. Data Overview AgentBoard is composed of 9 diverse tasks: 7 vision and language tasks and 4 lanuage only tasks. The performances of different domains as mid-training data are as follows: | Domains | Observation | WebArena (PR) | WebArena (SR) | AndroidWorld (SR) | |----------------------------------|-------------------|--------------:|--------------:|------------------:| | **GUI Post-Training Only** | Image | 26.3 | 6.2 | 9.0 | | **Public Baselines** | | | | | | GPT-4o-2024-11-20 | Image | 36.9 | 15.6 | 11.7 | | OS-Genesis-7B | Image + Accessibility Tree | -- | -- | 17.4 | | AGUVIS-72B | Image | - | - | 26.1 | | Claude3-Haiku | Accessibility Tree| 26.8 | 12.7 | - | | Llama3-70b | Accessibility Tree| 35.6 | 12.6 | - | | Gemini1.5-Flash | Accessibility Tree| 32.4 | 11.1 | - | | **Vision-and-Language Modality** | | | | | | Chart/Document QA | Image | 24.6 | 6.2 | 15.3 | | Non-GUI Perception | Image | 28.7 | 7.6 | 14.0 | | GUI Perception | Image | 27.4 | 7.1 | 14.0 | | Web Screenshot2Code | Image | 28.0 | 6.6 | 9.9 | | Non-GUI Agents | Image | 30.8 | 8.5 | 13.5 | | Multi-modal Math ✓ | Image | 30.4 | 8.5 | 15.3 | | Multi-round Visual Conversation | Image | 30.0 | 9.0 | 12.6 | | **Language Modality** | | | | | | MathInstruct ✓ | Image | 31.9 | 10.9 | 14.4 | | Olympiad Math ✓ | Image | 31.5 | 8.5 | 13.1 | | CodeI/O ✓ | Image | 29.2 | 9.0 | 14.9 | | Web Knowledge Base | Image | 31.3 | 9.5 | 9.0 | | **Domain Combination(domains with ✓)** | | | | | | **GUIMid** | Image | **34.3** | **9.5** | **21.2** | To help researchers quickly understand evaluation data of each task, we provide **Dataset example** at the anonymous github: [🤗 GUIMid](https://github.com/hkust-nlp/GUIMid#). ## 2. Download Link You can download the json files by: ``` huggingface-cli download --resume-download hkust-nlp/GUIMid --local-dir hkust-nlp/GUIMid ``` , and then extract the images by: ```bash tar -zxcf xxx.tar.gz ``` **For users with network problems, you can try [HF-Mirror](https://hf-mirror.com/)** ## 3. Data Files Introduction ### Post-Training Data: Our post-training dataset includes multimodal data (text and images) from mobile and web domains. Text data is in `GUI_trajectory.json`, and images are in `traj.tar.gz`. ### Mid-training data for each domain We provide **mid-training data** covering **7 vision-language domains** and **4 language-only domains**: **Vision-Language Domains** - `Chart_Document_QA.json` - `GUI_Perception.json` - `Multi-modal_Math.json` - `Multi-round_Visual_Conversation.json` - `Non-GUI_Agents.json` - `Web_Screenshot2Code.json` - `Non-GUI_Perception.json` **Language-Only Domains** - `CodeIO.json` - `MathInstruct.json` - `Olympiad_Math.json` - `Web_Knowledge_Base.json` *(Image data for some domains will be released shortly.)* ### GUIMid Data We provide the GUIMid. Text data is in `GUIMid.json`, and images are in `mavis.tar.gz`. ## Citation If you find this repository helpful, feel free to cite our paper: ```bibtex @article{zhang2025breaking, title={Breaking the Data Barrier--Building GUI Agents Through Task Generalization}, author={Zhang, Junlei and Ding, Zichen and Ma, Chang and Chen, Zijie and Sun, Qiushi and Lan, Zhenzhong and He, Junxian}, journal={arXiv preprint arXiv:2504.10127}, year={2025} } ```

<div align="center"> <h1>打破数据壁垒——基于任务泛化构建图形用户界面智能体(GUI Agents)</h1> </div> <div align="center"> [🐙 GitHub 仓库](https://github.com/hkust-nlp/GUIMid) | 📝 [论文](https://arxiv.org/abs/2504.10127) | [🤗 预训练中期数据集(Mid-training Data)](https://huggingface.co/datasets/hkust-nlp/GUIMid/) | [🤗 预训练后数据集(Post-Training Data)](https://huggingface.co/datasets/hkust-nlp/GUIMid/blob/main/GUI_trajectory.json) </div> <div align="center"> <img src="https://cdn-uploads.huggingface.co/production/uploads/63b76e716fc56e43c3c22ca8/6fepPX_FZRCiqHgypsBMD.png" width="60%" /> </div> ## 待办事项 - [ ] 计划于5月10日发布规模更大、覆盖领域更广的GUIMid数据集 ## 1. 数据概览 AgentBoard 涵盖9类多样化任务:7类视觉语言任务与4类纯语言任务。 各领域作为预训练中期数据集的性能表现如下: | 领域 | 观测模态 | WebArena (PR) | WebArena (SR) | AndroidWorld (SR) | |----------------------------------|-------------------|--------------:|--------------:|------------------:| | **仅GUI预训练后数据** | 图像 | 26.3 | 6.2 | 9.0 | | **公开基线模型** | | | | | | GPT-4o-2024-11-20 | 图像 | 36.9 | 15.6 | 11.7 | | OS-Genesis-7B | 辅助树(Accessibility Tree) | -- | -- | 17.4 | | AGUVIS-72B | 图像 | - | - | 26.1 | | Claude3-Haiku | 辅助树(Accessibility Tree)| 26.8 | 12.7 | - | | Llama3-70b | 辅助树(Accessibility Tree)| 35.6 | 12.6 | - | | Gemini1.5-Flash | 辅助树(Accessibility Tree)| 32.4 | 11.1 | - | | **视觉语言模态** | | | | | | 图表/文档问答 | 图像 | 24.6 | 6.2 | 15.3 | | 非GUI感知 | 图像 | 28.7 | 7.6 | 14.0 | | GUI感知 | 图像 | 27.4 | 7.1 | 14.0 | | Web 截图转代码 | 图像 | 28.0 | 6.6 | 9.9 | | 非GUI智能体 | 图像 | 30.8 | 8.5 | 13.5 | | 多模态数学问答 ✓ | 图像 | 30.4 | 8.5 | 15.3 | | 多轮视觉对话 | 图像 | 30.0 | 9.0 | 12.6 | | **纯语言模态** | | | | | | MathInstruct ✓ | 图像 | 31.9 | 10.9 | 14.4 | | 奥林匹克数学竞赛 ✓ | 图像 | 31.5 | 8.5 | 13.1 | | 代码输入输出 ✓ | 图像 | 29.2 | 9.0 | 14.9 | | Web 知识库 | 图像 | 31.3 | 9.5 | 9.0 | | **领域组合(带✓标记的领域)** | | | | | | **GUIMid** | 图像 | **34.3** | **9.5** | **21.2** | 为帮助研究者快速理解各任务的评估数据,我们在匿名GitHub仓库中提供了**数据集示例**:[🤗 GUIMid](https://github.com/hkust-nlp/GUIMid#)。 ## 2. 下载链接 你可以通过如下命令下载JSON数据集文件: huggingface-cli download --resume-download hkust-nlp/GUIMid --local-dir hkust-nlp/GUIMid 随后通过以下命令解压图像数据: bash tar -zxcf xxx.tar.gz 针对网络访问存在困难的用户,可尝试使用 [HF镜像站(HF-Mirror)](https://hf-mirror.com/)。 ## 3. 数据集文件说明 ### 预训练后数据集 本预训练后数据集包含来自移动与Web领域的多模态数据(文本与图像)。其中文本数据存储于`GUI_trajectory.json`,图像数据存储于`traj.tar.gz`。 ### 各领域预训练中期数据集 我们提供覆盖**7个视觉语言领域**与**4个纯语言领域**的**预训练中期数据集**: #### 视觉语言领域 - `Chart_Document_QA.json` - `GUI_Perception.json` - `Multi-modal_Math.json` - `Multi-round_Visual_Conversation.json` - `Non-GUI_Agents.json` - `Web_Screenshot2Code.json` - `Non-GUI_Perception.json` #### 纯语言领域 - `CodeIO.json` - `MathInstruct.json` - `Olympiad_Math.json` - `Web_Knowledge_Base.json` *部分领域的图像数据将于近期发布。* ### GUIMid 数据集 本项目提供GUIMid数据集,其文本数据存储于`GUIMid.json`,图像数据存储于`mavis.tar.gz`。 ## 引用 若本仓库对你的研究有所帮助,敬请引用我们的论文: bibtex @article{zhang2025breaking, title={Breaking the Data Barrier--Building GUI Agents Through Task Generalization}, author={Zhang, Junlei and Ding, Zichen and Ma, Chang and Chen, Zijie and Sun, Qiushi and Lan, Zhenzhong and He, Junxian}, journal={arXiv preprint arXiv:2504.10127}, year={2025} }
提供机构:
maas
创建时间:
2025-04-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作