five

English-Akuapem Twi Parallel Corpus

收藏
arXiv2021-04-01 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2103.15625v3
下载链接
链接失效反馈
官方服务:
资源简介:
English-Akuapem Twi Parallel Corpus是由加纳自然语言处理协会创建的一个包含25,421对英语和Akuapem Twi句子的平行语料库。该数据集旨在支持机器翻译模型的进一步训练,特别是针对Akuapem Twi方言。数据集的创建过程涉及使用基于Transformer的翻译器生成初步翻译,随后由母语者进行验证和必要的修正。此外,还提供了697个高质量的众包句子作为评估集,用于下游自然语言处理任务。该数据集的应用领域包括机器翻译、代表性学习、分类等,旨在解决加纳语言在数字化未来中语言和文化保存的问题。

The English-Akuapem Twi Parallel Corpus is a parallel corpus consisting of 25,421 pairs of English and Akuapem Twi sentences, developed by the Ghanaian Society for Natural Language Processing. This dataset is intended to support further training of machine translation models, especially for the Akuapem Twi dialect. In the creation process, a Transformer-based translator was employed to generate preliminary translations, which were then verified and amended as needed by native speakers. In addition, 697 high-quality crowdsourced sentences are provided as an evaluation set for downstream natural language processing tasks. The applicable fields of this dataset include machine translation, representation learning, classification and so on, and it aims to address the issues of language and cultural preservation for Ghanaian languages in the digital future.
提供机构:
加纳自然语言处理协会
创建时间:
2021-03-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作