PMI subset of the WAT 2021 MultiIndicMT training set

Name: PMI subset of the WAT 2021 MultiIndicMT training set
Creator: WAT 2021
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

http://lotus.kuee.kyoto-u.ac.jp/WAT/indic-multilingual

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个低资源的平行语料库，包含大约32.6万对用于机器翻译任务的句子对。它代表了一个资源极其匮乏的环境，在这种情境下，IndicBART预期能够发挥最大的效益。该数据集的规模为32.6万对句子对，适用于神经机器翻译（NMT）任务。

This dataset is a low-resource parallel corpus containing approximately 326,000 sentence pairs for machine translation tasks. It represents an extremely low-resource scenario where IndicBART is expected to achieve optimal performance. With a scale of 326,000 sentence pairs, this dataset is applicable to neural machine translation (NMT) tasks.

提供机构：

WAT 2021

5,000+

优质数据集

54 个

任务类型

进入经典数据集