IndicNECorp1.0
收藏arXiv2025-09-30 收录
下载链接:
https://www2.statmt.org/wmt24/indic-mt-task.html
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了大约260万句阿萨姆语、10万句卡西语、200万句米佐语和100万句曼尼普尔语,其中包括了英语与印度语系的双语对照句子。此外,还包含了训练集,其中有5万句英语与阿萨姆语以及英语与米佐语的平行句子,以及分别为英语与卡西语和英语与曼尼普尔语提供的2.4万句和2.16万句。总的来说,该数据集大约有260万句句子。这项任务的目的是针对低资源的印度语系语言翻译。
This dataset contains approximately 2.6 million Assamese sentences, 100,000 Khasi sentences, 2 million Mizo sentences, and 1 million Manipuri sentences, including bilingual parallel sentence pairs between English and Indo-Aryan languages. In addition, it includes training sets with 50,000 parallel sentence pairs between English and Assamese, as well as English and Mizo, plus 24,000 pairs for English-Khasi and 21,600 pairs for English-Manipuri respectively. Overall, this dataset has approximately 2.6 million sentences in total. The objective of this task is to conduct machine translation for low-resource Indo-Aryan languages of India.
提供机构:
WMT24 Indic MT Shared Task organizers



