Akan–English Maternal Health Parallel Text Corpus for Machine Translation
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/smvdyf9fgm
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains a curated bilingual parallel corpus developed to support domain-specific neural machine translation (NMT) for maternal health communication between Akan and English. The corpus was constructed to address the scarcity of healthcare-specific parallel data for low-resource African languages, particularly Akan.
The dataset comprises 20,101 cleaned English–Akan parallel sentence pairs, of which 12,100 pairs (60.2%) originate from maternal health content covering prenatal and postnatal care domains, and 8,006 pairs (39.8%) are drawn from general-domain sources to enhance linguistic diversity and model robustness. Maternal health topics represented include antenatal care, childbirth preparation, maternal mental health, nutrition, vaccination, medication use, lifestyle behaviours, preventive medicine, personal hygiene, and common pregnancy-related conditions.
Value of the Dataset
This dataset provides one of the first domain-specific maternal health resources for Akan that integrates both parallel text and aligned speech data, enabling research in neural machine translation, speech recognition, text-to-speech, and multimodal health communication systems. It supports the development of inclusive digital health tools such as maternal health chatbots and voice-based systems designed for Akan-speaking communities and other low-resource language contexts.
This corpus was developed as part of the Ɔbaa Panin Project, which seeks to build a conversational maternal health chatbot in Akan. The Ɔbaa Panin Project is funded by Google Research.
创建时间:
2026-02-24



