PMI subset of the WAT 2021 MultiIndicMT training set
收藏arXiv2025-09-30 收录
下载链接:
http://lotus.kuee.kyoto-u.ac.jp/WAT/indic-multilingual
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个低资源的平行语料库,包含大约32.6万对用于机器翻译任务的句子对。它代表了一个资源极其匮乏的环境,在这种情境下,IndicBART预期能够发挥最大的效益。该数据集的规模为32.6万对句子对,适用于神经机器翻译(NMT)任务。
This dataset is a low-resource parallel corpus containing approximately 326,000 sentence pairs for machine translation tasks. It represents an extremely low-resource scenario where IndicBART is expected to achieve optimal performance. With a scale of 326,000 sentence pairs, this dataset is applicable to neural machine translation (NMT) tasks.
提供机构:
WAT 2021



