ShapeWorld
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/alexkuhnle/shapeworld
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名是一个可配置的生成系统,旨在生成带有伴随标题和一致性值的抽象、视觉基础语言数据图像,这些一致性值表示标题与图像的关联真实性。该数据集包含了多种量词,并且是为了促进类似于心理语言学研究中的实验而生成,这些实验专注于对如“大多数”这类量词的解释。图像中最多包含15个随机位置排列的对象。该数据集适用于视觉问答(Vqa)和图像标题一致性任务。
This dataset is a configurable generative system designed to generate abstract, vision-grounded language-aligned image data paired with accompanying captions and consistency values, where the consistency values indicate the associative authenticity between the captions and their corresponding images. The dataset incorporates various quantifiers and is developed to facilitate experiments analogous to those in psycholinguistic research focused on the interpretation of quantifiers such as 'most'. Each image contains up to 15 objects arranged at random positions. This dataset is applicable to visual question answering (VQA) and image caption consistency tasks.
提供机构:
Kuhnle and Copestake



