PaperPDF
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/yh-hust/PDF-Wukong
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了从arXiv广泛收集的学术论文,用于自动生成100万个问答对以及它们相应的证据来源。该数据集特别构建于训练和评估PDF-WuKong模型在理解长篇多模态文档方面的能力。其所涉及的任务是多模态问答。
This dataset contains academic papers extensively collected from arXiv, and is used to automatically generate 1 million question-answer pairs along with their corresponding evidence sources. It is specifically constructed for training and evaluating the PDF-WuKong model's capability in understanding long-form multimodal documents. The task involved in this dataset is multimodal question answering.
提供机构:
Authors of the paper



