DocVQA Spoken Version

Name: DocVQA Spoken Version
Creator: Authors of the paper
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://vqsqainterspeech.github.io/

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是对DocVQA评估集的口语版本，通过将文本问题合成为语音而创建。在此过程中，只有DocVQA中的问题被合成为语音，而其他提示元素仍保持文本格式。该数据集包含了多个用于评估的版本，涉及42位不同的说话人，且这些说话人在训练集说话人中没有重叠。其任务是进行口语视觉问题回答（Svqa）。

This dataset is a spoken adaptation of the DocVQA evaluation benchmark, created by synthesizing the benchmark's textual questions into speech. During this process, only the original DocVQA questions are converted to speech, while all other prompt elements maintain their original text format. This dataset provides multiple evaluation versions, involving 42 unique speakers with no overlap with those from the training split. The targeted task for this dataset is Spoken Visual Question Answering (SVQA).

提供机构：

Authors of the paper

5,000+

优质数据集

54 个

任务类型

进入经典数据集