Luganda-English pairwise corpus

Name: Luganda-English pairwise corpus
Creator: Makerere AI and data science research lab
Published: 2023-01-07 11:26:09
License: 暂无描述

arXiv2023-01-07 更新2024-08-06 收录

下载链接：

http://arxiv.org/abs/2301.02773v1

下载链接

链接失效反馈

官方服务：

资源简介：

本研究构建了名为Luganda-English pairwise corpus的双语平行语料库，由Makerere AI和数据科学研究实验室等机构合作创建。该数据集包含41,070对Luganda和英语平行句子，源自三个不同的开源语料库。创建过程中，研究人员对数据进行了清洗和预处理，确保翻译质量。此数据集主要用于训练神经机器翻译模型，旨在提升低资源语言Luganda的翻译能力，解决现有翻译工具如Google翻译不支持Luganda的问题。

This study constructs a bilingual parallel corpus named Luganda-English Pairwise Corpus, which is co-developed by institutions including Makerere AI and the Data Science Research Laboratory. This dataset contains 41,070 parallel sentence pairs in Luganda and English, sourced from three distinct open-source corpora. During the corpus construction process, researchers conducted data cleaning and preprocessing to guarantee translation quality. This dataset is primarily utilized for training neural machine translation models, with the goal of enhancing translation performance for the low-resource language Luganda, and resolving the problem that existing translation tools such as Google Translate do not support Luganda.

提供机构：

Makerere AI and data science research lab

创建时间：

2023-01-07

5,000+

优质数据集

54 个

任务类型

进入经典数据集