five

ENGLISH-AKUAPEM TWI PARALLEL CORPUS

收藏
Zenodo2021-01-10 更新2026-05-25 收录
下载链接:
https://zenodo.org/record/4430881
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset <em><strong>(verified_data.csv)</strong></em> is bilingual machine translation training corpus for English and Akuapem Twi of 25,421 sentence pairs. <br> A transformer-based machine translator was used to generate initial translations in Akuapem Twi, which were later verified and corrected where necessary by native speakers. <br> The main idea of a typical use case for the dataset is for further training of machine translation models in Akuapem Twi.<br> The data can also be used for other downstream NLP tasks such as Named Entity Recognition and POS tagging, with appropriate additional annotations. <br> Another potential application is training unsupervised embeddings for the Akuapem Twi language.<br> In addition a higher quality 697 crowdsourced sentences <em><strong>(crowdsourced_data.csv) </strong></em>are provided for use as an evaluation set for the tasks highlighted above. It is recommended as a testing dataset for machine translation English to Twi and Twi to English models. <strong>Acknowledgement</strong>: This project was supported by the AI4D language dataset fellowship through K4all and Zindi Africa
提供机构:
Zenodo
创建时间:
2021-01-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作