OpenGVLab/MMBench-GUI
收藏Hugging Face2025-08-15 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/OpenGVLab/MMBench-GUI
下载链接
链接失效反馈官方服务:
资源简介:
MMBench-GUI是一个分层的、多平台的基准框架和工具箱,用于评估GUI智能体的性能。它包括四个评估级别:GUI内容理解、GUI元素定位、GUI任务自动化和GUI任务协作。该框架还提出了效率-质量面积(EQA)指标,用于集成准确性和效率。数据集基于VLMEvalkit开发,支持以API方式或本地部署方式评估模型。目前开源了第1级和第2级的图片和json文件。
MMBench-GUI is a hierarchical, multi-platform benchmarking framework and toolkit for evaluating GUI agents. It includes four levels of evaluation: GUI Content Understanding, GUI Element Grounding, GUI Task Automation, and GUI Task Collaboration. The framework also proposes the Efficiency-Quality Area (EQA) metric for integrating accuracy and efficiency in agent navigation. The dataset is developed based on VLMEvalkit and supports model evaluation in an API manner or local deployment. Currently, images and json files for level 1 and level 2 are open sourced.
提供机构:
OpenGVLab



