Wikipedia Comparable Corpora
收藏arXiv2025-09-30 收录
下载链接:
https://linguatools.org/tools/corpora/wikipedia-comparable-corpora/
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了英文和德文的维基百科文章,这些文章经过对齐处理,用于训练和评估M3L-Contrast模型。此外,数据集还包括了对齐的图片,专门用于评估M3L-Contrast模型的表现。其规模包括20,000个训练元组和1,000个测试元组,任务涉及多语言和多模态的主题建模。
This dataset comprises aligned English and German Wikipedia articles, intended for training and evaluating the M3L-Contrast model. Furthermore, the dataset includes aligned images specifically used to assess the performance of the M3L-Contrast model. It consists of 20,000 training tuples and 1,000 test tuples, with its tasks involving multilingual and multimodal topic modeling.
提供机构:
Linguatools



