OK-VQA (Outside Knowledge Visual Question Answering)

Name: OK-VQA (Outside Knowledge Visual Question Answering)
Creator: OpenDataLab
Published: 2026-05-24 04:30:03
License: 暂无描述

OpenDataLab2026-05-24 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/OK-VQA

下载链接

链接失效反馈

官方服务：

资源简介：

理想形式的视觉问答 (VQA) 让我们能够在视觉和语言的联合空间中研究推理，并充当场景理解 AI 任务的代理。然而，迄今为止，大多数 VQA 基准测试都集中在简单计数、视觉属性和对象检测等问题上，这些问题不需要推理或图像之外的知识。在本文中，我们解决了基于知识的视觉问答任务，并提供了一个名为 OK-VQA 的基准，其中图像内容不足以回答问题，鼓励依赖外部知识资源的方法。我们的新数据集包括 14,000 多个需要外部知识才能回答的问题。我们表明，在这种新环境中，最先进的 VQA 模型的性能会急剧下降。我们的分析表明，与以前的基于知识的 VQA 数据集相比，我们的基于知识的 VQA 任务是多样化、困难和庞大的。我们希望这个数据集能够使研究人员为该领域的研究开辟新的途径。

Ideal visual question answering (VQA) enables us to study reasoning in the joint space of vision and language, and serves as a proxy for scene understanding AI tasks. However, to date, most VQA benchmarks focus on simple tasks such as counting, visual attributes recognition, and object detection, which require no reasoning or external knowledge beyond the image itself. In this paper, we address the knowledge-based visual question answering task and present a benchmark named OK-VQA, where the image content alone is insufficient to answer the questions, encouraging approaches that rely on external knowledge resources. Our new dataset contains over 14,000 questions that require external knowledge to be answered. We show that state-of-the-art VQA models suffer a sharp performance drop in this new setting. Our analysis demonstrates that our knowledge-based VQA task is diverse, challenging, and large-scale compared with previous knowledge-based VQA datasets. We hope that this dataset will open up new research avenues for researchers in this field.

提供机构：

OpenDataLab

创建时间：

2022-04-29

搜集汇总

数据集介绍