Visual QA dataset
收藏arXiv2018-06-11 更新2024-06-21 收录
下载链接:
http://www.teds.usc.edu/website_vqa/
下载链接
链接失效反馈官方服务:
资源简介:
Visual QA数据集是由南加州大学创建的一个大规模多选题视觉问答数据集,包含超过140万条图像-问题-候选答案三元组。该数据集利用MSCOCO图像库,通过人工标注生成多样化的问题和答案,旨在评估和提升机器对视觉和语言信息的理解和推理能力。数据集的创建过程中,特别关注了候选答案的设计,以确保机器必须综合考虑图像、问题和答案三方面的信息才能正确回答。该数据集主要应用于视觉问答任务,旨在解决机器在理解和处理多模态信息方面的挑战。
The Visual QA dataset is a large-scale multiple-choice visual question answering dataset created by the University of Southern California, containing over 1.4 million image-question-candidate answer triples. It leverages the MSCOCO image dataset, with diverse questions and answers generated via manual annotation, aiming to evaluate and enhance machines' capacity for understanding and reasoning over both visual and linguistic information. Particular attention was devoted to the design of candidate answer options during the dataset's construction, ensuring that machines must comprehensively integrate information from the image, question, and candidate answers to correctly answer the questions. This dataset is primarily applied to visual question answering tasks, with the core goal of addressing the challenges faced by machines in understanding and processing multimodal information.
提供机构:
南加州大学
创建时间:
2017-04-24



