five

RICO-ScreenAnnotation-f

收藏
魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/rootsautomation/RICO-ScreenAnnotation-f
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for RICO Screen Annotations This is a standardization of Google's Screen Annotation dataset on a subset of RICO screens, as described in their ScreenAI paper. Unlike the original, this version transforms integer-based bounding boxes into floating-point-based bounding boxes of 2 decimal precision. ## Dataset Details ### Dataset Description This is an image-to-text annotation format first proscribed in Google's ScreenAI paper. The idea is to standardize an expected text output that is reasonable for the model to follow, and fuses together things like element detection, referring expression generation/recognition, and element classification. - **Curated by:** Google Research - **Language(s) (NLP):** English - **License:** CC-BY-4.0 ### Dataset Sources - **Repository:** [google-research/screen_annotation](https://github.com/google-research-datasets/screen_annotation/tree/main) - **Paper [optional]:** [ScreenAI](https://arxiv.org/abs/2402.04615) ## Uses ### Direct Use Pre-training of multimodal models to better understand screens. ## Dataset Structure - `screen_id`: Screen ID in the RICO dataset - `screen_annotation`: Target output string - `image`: The RICO screenshot ## Dataset Creation ### Curation Rationale > The Screen Annotation dataset consists of pairs of mobile screenshots and their annotations. The mobile screenshots are directly taken from the publicly available Rico dataset. The annotations are in text format, and contain information on the UI elements present on the screen: their type, their location, the text they contain or a short description. This dataset has been introduced in the paper ScreenAI: A Vision-Language Model for UI and Infographics Understanding and can be used to improve the screen understanding capabilities of multimodal (image+text) models. ## Citation **BibTeX:** ``` @misc{baechler2024screenai, title={ScreenAI: A Vision-Language Model for UI and Infographics Understanding}, author={Gilles Baechler and Srinivas Sunkara and Maria Wang and Fedir Zubach and Hassan Mansoor and Vincent Etter and Victor Cărbune and Jason Lin and Jindong Chen and Abhanshu Sharma}, year={2024}, eprint={2402.04615}, archivePrefix={arXiv}, primaryClass={cs.CV} } ``` ## Dataset Card Authors Hunter Heidenreich, Roots Automation ## Dataset Card Contact hunter "dot" heidenreich AT rootsautomation `DOT` com

# RICO屏幕标注数据集卡片 本数据集是对谷歌基于RICO数据集子集构建的屏幕标注数据集的标准化处理,正如其在《ScreenAI》论文中所述。与原始版本不同,本版本将基于整数的边界框转换为保留两位小数精度的浮点型边界框。 ## 数据集详情 ### 数据集描述 本数据集采用谷歌在《ScreenAI》论文中首次提出的图像到文本标注格式。其核心思路是标准化模型可遵循的合理预期文本输出,并融合了元素检测、指代表达式生成/识别以及元素分类等多项任务。 - **整理方:** 谷歌研究院(Google Research) - **语言(NLP):** 英语 - **授权协议:** CC-BY-4.0 ## 数据集来源 - **代码仓库:** [google-research/screen_annotation](https://github.com/google-research-datasets/screen_annotation/tree/main) - **相关论文(可选):** [ScreenAI](https://arxiv.org/abs/2402.04615) ## 数据集用途 ### 直接用途 用于预训练多模态模型(Multimodal Model),以提升其对屏幕界面的理解能力。 ## 数据集结构 - `screen_id`: RICO数据集(RICO Dataset)中的屏幕ID - `screen_annotation`: 目标输出字符串 - `image`: RICO截图(RICO Screenshot) ## 数据集构建 ### 整理依据 > 本屏幕标注数据集包含移动设备截图及其对应标注的配对样本。移动截图直接取自公开可用的RICO数据集(RICO Dataset)。标注采用文本格式,涵盖屏幕上所有UI元素的相关信息:元素类型、位置、所含文本或简短描述。本数据集由《ScreenAI:面向UI与信息图表理解的视觉语言模型(Vision-Language Model for UI and Infographics Understanding)》论文提出,可用于提升多模态模型(Multimodal Model)的屏幕理解能力。 ## 引用 **BibTeX:** @misc{baechler2024screenai, title={ScreenAI: A Vision-Language Model for UI and Infographics Understanding}, author={Gilles Baechler and Srinivas Sunkara and Maria Wang and Fedir Zubach and Hassan Mansoor and Vincent Etter and Victor Cărbune and Jason Lin and Jindong Chen and Abhanshu Sharma}, year={2024}, eprint={2402.04615}, archivePrefix={arXiv}, primaryClass={cs.CV} } ## 数据集卡片作者 Hunter Heidenreich, Roots Automation ## 数据集卡片联系方式 hunter "dot" heidenreich AT rootsautomation `DOT` com
提供机构:
maas
创建时间:
2025-10-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作