PDF-MVQA
收藏arXiv2024-04-19 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2404.12720v1
下载链接
链接失效反馈官方服务:
资源简介:
PDF-MVQA数据集是由墨尔本大学、悉尼大学和西澳大学的研究团队开发的,专门针对研究期刊文章的多页和多模态信息检索。该数据集不同于传统的机器阅读理解任务,其主要目标是检索包含答案或视觉丰富文档实体(如表格和图表)的完整段落。数据集包含3146篇文档,总计30,239页,每篇文档平均关联84个问题,共计262,928个问题-答案对。PDF-MVQA数据集通过引入新的多模态文档实体检索框架,旨在提高现有视觉和语言模型在处理文本主导文档在视觉问答任务中的挑战。
The PDF-MVQA dataset is developed by research teams from the University of Melbourne, the University of Sydney, and the University of Western Australia, specifically designed for multi-page and multimodal information retrieval over research journal articles. Unlike traditional machine reading comprehension tasks, its core objective is to retrieve full-length paragraphs that contain either answers or visually-rich document entities such as tables and figures. The dataset consists of 3,146 documents, totaling 30,239 pages, with an average of 84 questions associated with each document, resulting in a total of 262,928 question-answer pairs. By proposing a novel multimodal document entity retrieval framework, the PDF-MVQA dataset aims to enhance the performance of existing visual-language models in addressing the challenges encountered when processing text-dominant documents in visual question answering tasks.
提供机构:
墨尔本大学, 悉尼大学, 西澳大学
创建时间:
2024-04-19



