OK-VQA

Name: OK-VQA
Creator: maas
Published: 2025-10-31 16:15:51
License: 暂无描述

魔搭社区2025-10-31 更新2024-08-31 收录

下载链接：

https://modelscope.cn/datasets/OmniData/OK-VQA

下载链接

链接失效反馈

官方服务：

资源简介：

displayName: OK-VQA (Outside Knowledge Visual Question Answering) labelTypes: - Text license: - Unknown mediaTypes: - Image - Text paperUrl: https://arxiv.org/pdf/1906.00067v2.pdf publishDate: "2019" publishUrl: https://okvqa.allenai.org/ publisher: - Carnegie Mellon University - University of Washington - Allen Institute for Artificial Intelligence tags: - Language taskTypes: - Question Generation - Visual Question Answering --- # 数据集介绍 ## 简介理想形式的视觉问答 (VQA) 让我们能够在视觉和语言的联合空间中研究推理，并充当场景理解 AI 任务的代理。然而，迄今为止，大多数 VQA 基准测试都集中在简单计数、视觉属性和对象检测等问题上，这些问题不需要推理或图像之外的知识。在本文中，我们解决了基于知识的视觉问答任务，并提供了一个名为 OK-VQA 的基准，其中图像内容不足以回答问题，鼓励依赖外部知识资源的方法。我们的新数据集包括 14,000 多个需要外部知识才能回答的问题。我们表明，在这种新环境中，最先进的 VQA 模型的性能会急剧下降。我们的分析表明，与以前的基于知识的 VQA 数据集相比，我们的基于知识的 VQA 任务是多样化、困难和庞大的。我们希望这个数据集能够使研究人员为该领域的研究开辟新的途径。 ## 引文 ``` "@inproceedings{marino2019ok, title={Ok-vqa: A visual question answering benchmark requiring external knowledge}, author={Marino, Kenneth and Rastegari, Mohammad and Farhadi, Ali and Mottaghi, Roozbeh}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={3195--3204}, year={2019} }" ``` ## Download dataset :modelscope-code[]{type="git"}

displayName: OK-VQA（外部知识视觉问答，Outside Knowledge Visual Question Answering） labelTypes: - 文本 license: - 未知 mediaTypes: - 图像 - 文本 paperUrl: https://arxiv.org/pdf/1906.00067v2.pdf publishDate: "2019" publishUrl: https://okvqa.allenai.org/ publisher: - 卡内基梅隆大学（Carnegie Mellon University） - 华盛顿大学（University of Washington） - 艾伦人工智能研究所（Allen Institute for Artificial Intelligence） tags: - 语言 taskTypes: - 问题生成 - 视觉问答（Visual Question Answering，VQA） --- # 数据集介绍 ## 简介理想形态下的视觉问答（Visual Question Answering，VQA）为研究者提供了在视觉与语言联合空间中开展推理研究的有效载体，同时可作为场景理解类人工智能任务的代理基准。然而截至目前，绝大多数VQA基准测试均聚焦于简单计数、视觉属性识别、目标检测等无需复杂推理或依赖图像外知识的问题。本文针对基于知识的视觉问答任务展开研究，并构建了名为OK-VQA的基准数据集：该数据集的问题无法仅通过图像内容直接作答，需依托外部知识资源方可获得正确答案。本数据集包含超过14000道需要借助外部知识才能回答的问题。实验结果表明，当前最先进的VQA模型在该基准上的性能会出现显著下滑。分析显示，相较于此前的基于知识的VQA数据集，本数据集对应的任务更具多样性、挑战性与规模性。我们期望该数据集能够为该领域的研究开辟全新方向。 ## 引文 "@inproceedings{marino2019ok, title={Ok-vqa: A visual question answering benchmark requiring external knowledge}, author={Marino, Kenneth and Rastegari, Mohammad and Farhadi, Ali and Mottaghi, Roozbeh}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={3195--3204}, year={2019} }" ## 数据集下载 :modelscope-code[]{type="git"}

提供机构：

maas

创建时间：

2024-07-14

搜集汇总

数据集介绍