ZhuOnR/ScreenSpot
收藏Hugging Face2026-04-29 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/ZhuOnR/ScreenSpot
下载链接
链接失效反馈官方服务:
资源简介:
ScreenSpot是一个用于GUI grounding的评估基准,包含来自iOS、Android、macOS、Windows和Web环境的超过1200条指令,并标注了元素类型(文本或图标/部件)。数据集用于零样本评估多模态模型在屏幕上进行局部定位的能力。数据集的结构包括图像、文件名、指令、边界框、数据类型和数据源等信息。数据集的创建目的是为了评估多模态模型在屏幕上的文本到局部参考的转换能力。数据集的来源包括桌面屏幕、移动屏幕和网页屏幕的截图,由南京大学和上海人工智能实验室的研究人员收集和标注。
ScreenSpot is an evaluation benchmark for GUI grounding, comprising over 1200 instructions from iOS, Android, macOS, Windows and Web environments, along with annotated element types (Text or Icon/Widget). The dataset is used to zero-shot evaluate a multimodal models ability to locally ground on screens. Each test sample contains: image, file_name, instruction, bbox, data_type, and data_source. The dataset was created to benchmark multimodal models on screens, specifically to assess a models ability to translate text into a local reference within the image. The source data includes screenshot data spanning desktop screens (Windows, macOS), mobile screens (iPhone, iPad, Android), and web screens, collected and annotated by researchers at Nanjing University and Shanghai AI Laboratory.
提供机构:
ZhuOnR



