GazeVQA

Name: GazeVQA
Creator: 奈良先端科学技术大学院大学
Published: 2024-03-26 17:49:35
License: 暂无描述

arXiv2024-03-26 更新2024-06-21 收录

下载链接：

https://github.com/riken-grp/GazeVQA

下载链接

链接失效反馈

官方服务：

资源简介：

GazeVQA数据集由奈良先端科学技术大学院大学和RIKEN的Guardian Robot Project共同创建，专注于视觉问答任务中利用凝视信息澄清日语中的模糊问题。该数据集包含17,276个问答对，涉及10,760张图像，特别关注于通过凝视信息解决指令和省略引起的模糊性。数据集通过众包方式在MS-COCO子集上收集，确保问题在没有凝视信息的情况下难以回答。GazeVQA的应用领域主要在于提升人机交互系统中对视觉信息的理解和响应能力，特别是在处理模糊指令和省略表达时。

The GazeVQA dataset was co-created by the Nara Institute of Science and Technology (NAIST) and the Guardian Robot Project at RIKEN. It focuses on leveraging gaze information to disambiguate Japanese utterances in the visual question answering (VQA) task. The dataset contains 17,276 question-answer pairs and encompasses 10,760 images, with a specific focus on resolving ambiguities arising from incomplete instructions and elliptical expressions via gaze information. The dataset was collected via crowdsourcing on a subset of the MS-COCO dataset, ensuring that the questions are difficult to answer without access to the corresponding gaze information. The primary application scenarios of GazeVQA are centered on improving the visual information understanding and response capabilities of human-computer interaction systems, especially when dealing with ambiguous instructions and elliptical expressions.

提供机构：

奈良先端科学技术大学院大学

创建时间：

2024-03-26

5,000+

优质数据集

54 个

任务类型

进入经典数据集