neulab/CulturalGround
收藏Hugging Face2025-10-23 更新2025-08-09 收录
下载链接:
https://hf-mirror.com/datasets/neulab/CulturalGround
下载链接
链接失效反馈官方服务:
资源简介:
CulturalGround是一个大规模的多语言多模态视觉问答(VQA)数据集,旨在通过文化知识对多语言多模态语言模型进行接地。它包含了超过2200万个开放式问题和800万个多项选择VQA对,跨越了42个国家和39种语言。数据集是通过一个可扩展的流程创建的,该流程利用Wikidata识别文化概念,从Wikimedia Commons收集相应的图像,并生成基于事实的VQA对。CulturalGround数据集被用于训练CulturalPangea模型。
CulturalGround is a large-scale multilingual and multimodal visual question-answering (VQA) dataset designed to ground multimodal language models in diverse cultural knowledge. It includes over 22 million open-ended and 8 million multiple-choice VQA pairs, spanning 42 countries and 39 languages. The dataset was created using a scalable pipeline that leverages Wikidata to identify cultural concepts, gather corresponding images from Wikimedia Commons, and generate factually grounded VQA pairs. CulturalGround has been used to train the CulturalPangea model.
提供机构:
neulab



