ENGLISH-AKUAPEM TWI PARALLEL CORPUS
收藏Zenodo2021-01-10 更新2026-05-25 收录
下载链接:
https://zenodo.org/record/4430881
下载链接
链接失效反馈官方服务:
资源简介:
This dataset <em><strong>(verified_data.csv)</strong></em> is bilingual machine translation training corpus for English and Akuapem Twi of 25,421 sentence pairs. <br> A transformer-based machine translator was used to generate initial translations in Akuapem Twi, which were later verified and corrected where necessary by native speakers. <br> The main idea of a typical use case for the dataset is for further training of machine translation models in Akuapem Twi.<br> The data can also be used for other downstream NLP tasks such as Named Entity Recognition and POS tagging, with appropriate additional annotations. <br> Another potential application is training unsupervised embeddings for the Akuapem Twi language.<br> In addition a higher quality 697 crowdsourced sentences <em><strong>(crowdsourced_data.csv) </strong></em>are provided for use as an evaluation set for the tasks highlighted above. It is recommended as a testing dataset for machine translation English to Twi and Twi to English models. <strong>Acknowledgement</strong>: This project was supported by the AI4D language dataset fellowship through K4all and Zindi Africa
提供机构:
Zenodo
创建时间:
2021-01-10



