Multilingual Benchmark
收藏arXiv2025-09-30 收录
下载链接:
https://statmt.org/wmt19/translation-task.html
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个多样化的多语种数据集,用于机器翻译任务,涵盖了与英语配对的15种语言,每种语言的训练样本数量不等,范围从15.5万到5100万不等。数据集包含了一个共享的64K BPE词汇表,并使用语言ID标记来指示目标语言。
This dataset is a diverse multilingual dataset tailored for machine translation tasks, covering 15 languages paired with English. The number of training samples per language varies, ranging from 155,000 to 51 million. The dataset includes a shared 64K BPE vocabulary, and uses language ID tokens to indicate the target language.
提供机构:
WMT



