jn12/VisualGenome
收藏Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/jn12/VisualGenome
下载链接
链接失效反馈官方服务:
资源简介:
Visual Genome是一个大规模视觉语言数据集,包含密集的图像标注,具体包括:图像、区域描述、对象标注、属性标注、关系标注、场景图、视觉问答标注以及WordNet同义词集映射。数据集规模约为:10.8万张图像,540万个区域描述,170万个视觉问答对,380万个对象实例,280万个属性,230万个关系。该数据集可用于图像文本表示学习、图像描述生成、密集描述生成、对象检测、属性识别、视觉关系检测、场景图生成、视觉问答和视觉语言预训练等研究方向。
Visual Genome is a large-scale vision-language dataset with dense image annotations, including: Images, Region descriptions, Object annotations, Attribute annotations, Relationship annotations, Scene graphs, Visual question-answering annotations, and WordNet synset mappings. The dataset contains approximately: 108K images, 5.4M region descriptions, 1.7M visual question-answer pairs, 3.8M object instances, 2.8M attributes, and 2.3M relationships. This dataset can be used for research on: Image-text representation learning, Image captioning, Dense captioning, Object detection, Attribute recognition, Visual relationship detection, Scene graph generation, Visual question answering, and Vision-language pretraining.
提供机构:
jn12



