OK-VQA数据集、A-OKVQA数据集

Name: OK-VQA数据集、A-OKVQA数据集
Creator: Science Data Bank
Published: 2025-12-02 08:50:35
License: 暂无描述

DataCite Commons2025-12-02 更新2026-05-05 收录

下载链接：

https://www.scidb.cn/detail?dataSetId=2df55de402c64f1abfc47d108c44af59

下载链接

链接失效反馈

官方服务：

资源简介：

OK-VQA is a visual question answering benchmark dataset that requires external knowledge, containing over 14,000 questions which can only be answered with external knowledge, covering various knowledge categories such as science & technology, history, and sports 12. The images in the dataset are sourced from the COCO dataset, using its original 80k-40k training and validation splits as the training and test splits for OK-VQA 4. The construction of the dataset involves multiple rounds of annotation and screening: in the first round, MTurk workers write questions that require external knowledge based on the images, and in the second round, five different workers annotate answers for each question-image pair; subsequently, low-quality questions are manually filtered out, and by ensuring a uniform answer distribution and removing questions with inconsistent annotations, the final dataset consists of 9,009 questions in the training set and 5,046 questions in the test set 567. This dataset aims to promote research on knowledge-based visual question answering, and compared to other datasets, it places more emphasis on testing models' ability to reason using unstructured knowledge, with the performance of state-of-the-art VQA models dropping significantly on it。A-OKVQA is a visual question answering benchmark dataset containing approximately 25K questions, built based on images from the COCO 2017 dataset, with 23.7K unique images retained after filtering. It was completed through multiple rounds of annotation and screening by 437 crowdworkers on the Amazon Mechanical Turk platform, ultimately divided into a 17.1K training set, a 1.1K validation set, and a 6.7K test set. The questions in this dataset require a combination of various external knowledge such as common sense, world knowledge, and visual knowledge, as well as reasoning to answer. Each question is accompanied by multiple-choice answers, 10 free-form answers, and 3 rationales, supporting both direct answer and multiple-choice evaluation methods. It differs significantly from existing knowledge-based VQA datasets such as OK-VQA in terms of knowledge types and question diversity, and can effectively evaluate models' reasoning and knowledge application abilities

提供机构：

Science Data Bank

创建时间：

2025-12-02