MEWA
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/ezosa/M3L-topic-model/tree/master/data
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了与Wikipedia-based Image Text数据集中的图片对齐的英文维基百科文章,每个文档都包括一篇完整的英文维基百科文章及其相应的图片。规模上,我们使用了18.5千个文档。该数据集的任务是对多模态文档进行分析。
This dataset contains English Wikipedia articles aligned with the images from the Wikipedia-based Image Text dataset. Each document includes a complete English Wikipedia article and its corresponding images. In terms of scale, we utilized 18.5 thousand documents for this dataset. The task of this dataset is multimodal document analysis.



