NExT-GQA
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/doc-doc/next-gqa
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为NExT-GQA,是NExT-QA数据集的扩展版本,包含了10,500个与原始问答对相关联的时间定位标签。其目的是通过提供视觉证据来支持答案,从而提高视频问答系统的可靠性。该数据集特别为弱监督设置包含了时间标签,并由30名本科生于标注团队完成。它强调寻找视觉证据以支持答案,覆盖了8,911个问答对和1,557个视频,以推动带有视觉定位的视频问答任务的发展。
The dataset, named NExT-GQA, is an extended version of the NExT-QA dataset. It contains 10,500 temporal localization tags associated with the original question-answer pairs. Its core objective is to enhance the reliability of video question answering (VideoQA) systems by providing visual evidence to substantiate the corresponding answers. It specifically includes temporal localization tags for the weakly-supervised setting, and was annotated by a team of 30 undergraduate students. Covering 8,911 question-answer pairs and 1,557 videos, this dataset emphasizes the pursuit of visual evidence to support answers, aiming to advance the development of video question answering tasks with visual localization.



