GUI-Robust
收藏GUI-Robust数据集概述
数据集简介
- 名称: GUI-Robust
- 用途: 用于测试GUI代理在现实世界异常情况下的鲁棒性
- 完整数据集地址: https://huggingface.co/datasets/kuangtie/GUI-Robust
评估脚本
-
运行命令: bash python evaluation.py --model_name <YourModel> --eval_type step|task --task_type normal|abnormal --data_path <path_to_data_folder>
-
评估模式:
step: 评估每步的基础准确性(动作准确性和坐标准确性)task: 评估完整任务执行(动作准确性、坐标准确性和任务成功率)
模型集成
接口规范
-
单步预测方法: python def pred_step_loc(step_description: str, screenshot_base64: str) -> dict
-
全任务预测方法: python def pred_task_full(task_description: str, screenshot_list_base64: List[str]) -> List[dict]
预测输出格式
- 元素坐标 (x, y)
- 元素类型 (icon, text, box, none)
- 动作类型及内容 (click, input, get_info, open, wait, human)
引用
bibtex @inproceedings{ yang2025guirobust, title={GUI-Robust: A Comprehensive Dataset for Testing GUI Agent Robustness in Real-World Anomalies}, author={Jingqi Yang and Zhilong Song and Jiawei Chen and Mingli Song and Sheng Zhou and Linjun Sun and Xiaogang Ouyang and Chun Chen and Can Wang}, booktitle={NeurIPS Datasets and Benchmarks Track}, year={2025}, url={https://openreview.net/forum?id=22gw3kITCd}, }




