rootsautomation/RICO-Screen2Words

Name: rootsautomation/RICO-Screen2Words
Creator: rootsautomation
Published: 2024-04-16 18:54:01
License: 暂无描述

Hugging Face2024-04-16 更新2024-05-25 收录

下载链接：

https://hf-mirror.com/datasets/rootsautomation/RICO-Screen2Words

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: cc-by-4.0 size_categories: - 10K<n<100K task_categories: - image-to-text pretty_name: Screen2Words tags: - screens - mobile - phones dataset_info: features: - name: screenId dtype: int64 - name: captions sequence: string - name: file_name dtype: string - name: app_package_name dtype: string - name: play_store_name dtype: string - name: category dtype: string - name: average_rating dtype: float64 - name: number_of_ratings dtype: string - name: number_of_downloads dtype: string - name: file_name_icon dtype: string - name: file_name_semantic dtype: string - name: semantic_annotations dtype: string - name: view_hierarchy dtype: string - name: image dtype: image - name: image_icon dtype: image - name: image_semantic dtype: image splits: - name: train num_bytes: 3618314253.896 num_examples: 15743 - name: val num_bytes: 520496985.148 num_examples: 2364 - name: test num_bytes: 956009390.03 num_examples: 4310 download_size: 2473562659 dataset_size: 5094820629.073999 configs: - config_name: default data_files: - split: train path: data/train-* - split: val path: data/val-* - split: test path: data/test-* --- # Dataset Card for Screen2Words Screen2Words is a dataset providing screen summaries (i.e., image captions for mobile screens). It uses the RICO image database. ## Dataset Details ### Dataset Description - **Curated by:** Google Research, UIUC, Northwestern, University of Toronto - **Funded by:** Google Research - **Shared by:** Google Research - **Language(s) (NLP):** English - **License:** CC-BY-4.0 ### Dataset Sources - **Repository:** - [google-research-datasets/screen2words](https://github.com/google-research-datasets/screen2words) - [RICO raw downloads](http://www.interactionmining.org/rico.html) - **Paper:** - [Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning](https://arxiv.org/abs/2108.03353) - [Rico: A Mobile App Dataset for Building Data-Driven Design Applications](https://dl.acm.org/doi/10.1145/3126594.3126651) ## Uses This dataset is for developing multimodal automations for mobile screens. ### Direct Use - Automatic screen summarization & description - Language-Based UI retreival (given a UI, retreive similar interfaces) - Enhancing screen readers - Screen indexing - Conversational mobile applications ## Dataset Structure - `screenId`: Unique RICO screen ID - `image`: RICO screenshot - `image_icon`: Google Play Store icon for the app - `image_semantic`: Semantic RICO screenshot; details are abstracted away to main visual UI elements - `file_name`: Image local filename - `file_name_icon`: Icon image local filename - `file_name_semantic`: Screenshot Image as a semantic annotated image local filename - `captions`: A list of string captions - `app_package_name`: Android package name - `play_store_name`: Google Play Store name - `category`: Type of category of the app - `number_of_downloads`: Number of downloads of the app (as a coarse range string) - `number_of_ratings`: Number of ratings of the app on the Google Play store (as of collection) - `average_rating`: Average rating of the app on the Google Play Store (as of collection) - `semantic_annotations`: Reduced view hierarchy, to the semantically-relevant portions of the full view hierarchy. It corresponds to what is visualized in `image_semantic` and has a lot of details about what's on screen. It is stored as a JSON object string. - `view_hierarchy`: Full view-hierarchy ## Dataset Creation ### Curation Rationale - RICO rationale: Create a broad dataset that can be used for UI automation. An explicit goal was to develop automation software that can validate an app's design and assess whether it achieves its stated goal. - Screen2Words rationale: Create a dataset that facilities the distillation of screenshots into concise summaries ### Source Data - RICO: Mobile app screenshots, collected on Android devices. - Screen2Words: Human annotated screen summaries from paid contractors. #### Data Collection and Processing - RICO: Human and automated collection of Android screens. ~9.8k free apps from the Google Play Store. - Screen2Words: Takes the subset of screens used in RICO-SCA, which eliminates screens with missing or inaccurate view hierarchies. #### Who are the source data producers? - RICO: 13 human workers (10 from the US, 3 from the Philippines) through UpWork. - Screen2Words: 85 professional annotators ## Citation ### RICO **BibTeX:** ```misc @inproceedings{deka2017rico, title={Rico: A mobile app dataset for building data-driven design applications}, author={Deka, Biplab and Huang, Zifeng and Franzen, Chad and Hibschman, Joshua and Afergan, Daniel and Li, Yang and Nichols, Jeffrey and Kumar, Ranjitha}, booktitle={Proceedings of the 30th annual ACM symposium on user interface software and technology}, pages={845--854}, year={2017} } ``` **APA:** Deka, B., Huang, Z., Franzen, C., Hibschman, J., Afergan, D., Li, Y., ... & Kumar, R. (2017, October). Rico: A mobile app dataset for building data-driven design applications. In Proceedings of the 30th annual ACM symposium on user interface software and technology (pp. 845-854). ### Screen2Words **BibTeX:** ```misc @inproceedings{wang2021screen2words, title={Screen2words: Automatic mobile UI summarization with multimodal learning}, author={Wang, Bryan and Li, Gang and Zhou, Xin and Chen, Zhourong and Grossman, Tovi and Li, Yang}, booktitle={The 34th Annual ACM Symposium on User Interface Software and Technology}, pages={498--510}, year={2021} } ``` **APA:** Wang, B., Li, G., Zhou, X., Chen, Z., Grossman, T., & Li, Y. (2021, October). Screen2words: Automatic mobile UI summarization with multimodal learning. In The 34th Annual ACM Symposium on User Interface Software and Technology (pp. 498-510). ## Dataset Card Authors Hunter Heidenreich, Roots Automation ## Dataset Card Contact hunter "DOT" heidenreich "AT" rootsautomation "DOT" com

提供机构：

rootsautomation

原始信息汇总

数据集卡片 for Screen2Words

Screen2Words 是一个提供移动屏幕摘要（即移动屏幕的图像字幕）的数据集。它使用 RICO 图像数据库。

数据集详情

数据集描述

语言(NLP): 英语
许可证: CC-BY-4.0

数据集结构

screenId: 唯一的 RICO 屏幕 ID
image: RICO 截图
image_icon: Google Play Store 应用图标
image_semantic: 语义 RICO 截图；细节被抽象为主要视觉 UI 元素
file_name: 图像本地文件名
file_name_icon: 图标图像本地文件名
file_name_semantic: 语义标注图像本地文件名
captions: 字符串字幕列表
app_package_name: Android 包名
play_store_name: Google Play Store 名称
category: 应用类别类型
number_of_downloads: 应用下载数量（粗略范围字符串）
number_of_ratings: 应用在 Google Play Store 上的评分数量（截至收集时）
average_rating: 应用在 Google Play Store 上的平均评分（截至收集时）
semantic_annotations: 简化视图层次结构，对应于 image_semantic 中可视化的语义相关部分，存储为 JSON 对象字符串
view_hierarchy: 完整视图层次结构

数据集分割

训练集:
- 字节数: 3618314253.896
- 样本数: 15743
验证集:
- 字节数: 520496985.148
- 样本数: 2364
测试集:
- 字节数: 956009390.03
- 样本数: 4310

数据集大小

下载大小: 2473562659
数据集大小: 5094820629.073999

搜集汇总

数据集介绍

构建方式

在移动应用界面自动化研究领域，Screen2Words数据集的构建体现了严谨的学术流程。该数据集以RICO移动应用截图数据库为基础，从中筛选出视图层次结构完整且准确的子集，即RICO-SCA。随后，研究团队聘请了85名专业标注员，对筛选出的移动界面截图进行人工摘要标注，生成了简洁的文本描述。整个构建过程深度融合了原始视觉数据与高质量的语言标注，旨在为多模态学习提供精准的语料支持。

特点

Screen2Words数据集的核心特点在于其丰富的多模态与元数据集成。它不仅包含原始的移动应用截图，还提供了对应的语义抽象图像、应用图标以及完整的视图层次结构数据。尤为关键的是，每条数据都附带了由人工撰写的多个文本摘要，实现了视觉内容与语言描述的对齐。此外，数据集还囊括了应用类别、下载量、评分等丰富的元信息，为深入理解界面上下文与用户交互模式提供了多维度的分析基础。

使用方法

该数据集主要服务于移动界面自动理解与多模态学习的研究。使用者可加载图像与对应的文本标注，训练图像到文本的生成模型，以实现自动化的界面摘要生成。同时，结合语义标注图像与视图层次结构数据，研究者能够开发基于语言的界面检索系统或增强屏幕阅读技术。数据集已划分为训练集、验证集和测试集，便于直接用于模型的训练、验证与性能评估。

背景与挑战

背景概述

Screen2Words数据集诞生于2021年，由Google Research联合UIUC、西北大学及多伦多大学的研究团队共同构建，旨在推动移动界面自动摘要领域的发展。该数据集基于RICO移动应用图像数据库，通过专业标注人员为大量手机屏幕截图生成简洁的文字描述，核心研究问题聚焦于如何利用多模态学习技术，将复杂的视觉界面信息转化为自然语言摘要。这一工作不仅为自动化UI理解与交互提供了关键数据支撑，也显著促进了屏幕阅读增强、界面检索及对话式移动应用等前沿方向的研究进展。

当前挑战

在解决移动界面自动摘要这一领域问题时，Screen2Words面临的主要挑战在于如何准确捕捉屏幕中多样化的视觉元素与交互逻辑，并生成连贯、信息丰富的自然语言描述。界面元素的动态布局、图标语义的歧义性以及不同应用场景的差异性，均对模型的泛化能力提出了较高要求。在数据构建过程中，挑战则体现在确保标注质量与一致性上，需协调85名专业标注者，在消除不完整或错误的视图层次结构基础上，为每张屏幕生成多角度、高精度的摘要，同时保持大规模数据处理的效率与可靠性。

常用场景

经典使用场景

在移动界面设计与人机交互领域，Screen2Words数据集为自动屏幕摘要任务提供了关键支持。该数据集通过结合RICO数据库中的移动应用截图与人工标注的文本描述，构建了多模态学习的基础。研究者利用其丰富的图像与文本配对，训练模型从视觉界面中提取语义信息，生成简洁的屏幕内容概述，从而推动界面理解技术的进步。

解决学术问题

Screen2Words数据集有效应对了移动界面自动化中的核心挑战，即如何将复杂的视觉元素转化为自然语言描述。它解决了多模态融合、界面语义解析以及自动化摘要生成等学术问题，为评估应用设计是否符合目标提供了数据基础。该数据集的意义在于促进了跨模态学习研究，使得机器能够更精准地理解用户界面，进而提升自动化系统的智能水平。

衍生相关工作

基于Screen2Words数据集，衍生出多项经典研究工作，例如自动移动界面摘要模型Screen2Words本身，该工作利用多模态学习技术实现了屏幕到文本的转换。后续研究进一步探索了界面生成、设计验证以及交互自动化等领域，推动了数据驱动的界面设计工具的发展，为移动应用创新提供了理论支持与实践基础。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集