WikiMatrix
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/facebookresearch/laser/tree/master/tasks/wikimatrix
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为WikiMatrix,包含了英语与其他语言之间的平行文本,被用于多语言预训练。此外,还有一个子集是专门为IGLUE基准测试收集的。该数据集规模宏大,包含1900万个平行句子对,其任务重点在于多语言预训练。
Named WikiMatrix, this dataset contains parallel text pairs between English and other languages, and is utilized for multilingual pre-training. Additionally, one of its subsets is specifically collected for the IGLUE benchmark. Boasting a vast scale, it includes 19 million parallel sentence pairs, with its core task focusing on multilingual pre-training.



