Akan–English Maternal Health Parallel Text Corpus for Machine Translation

NIAID Data Ecosystem2026-05-10 收录

下载链接：

https://data.mendeley.com/datasets/smvdyf9fgm

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset contains a curated bilingual parallel corpus developed to support domain-specific neural machine translation (NMT) for maternal health communication between Akan and English. The corpus was constructed to address the scarcity of healthcare-specific parallel data for low-resource African languages, particularly Akan. The dataset comprises 20,101 cleaned English–Akan parallel sentence pairs, of which 12,100 pairs (60.2%) originate from maternal health content covering prenatal and postnatal care domains, and 8,006 pairs (39.8%) are drawn from general-domain sources to enhance linguistic diversity and model robustness. Maternal health topics represented include antenatal care, childbirth preparation, maternal mental health, nutrition, vaccination, medication use, lifestyle behaviours, preventive medicine, personal hygiene, and common pregnancy-related conditions. Value of the Dataset This dataset provides one of the first domain-specific maternal health resources for Akan that integrates both parallel text and aligned speech data, enabling research in neural machine translation, speech recognition, text-to-speech, and multimodal health communication systems. It supports the development of inclusive digital health tools such as maternal health chatbots and voice-based systems designed for Akan-speaking communities and other low-resource language contexts. This corpus was developed as part of the Ɔbaa Panin Project, which seeks to build a conversational maternal health chatbot in Akan. The Ɔbaa Panin Project is funded by Google Research.

创建时间：

2026-02-24

5,000+

优质数据集

54 个

任务类型

进入经典数据集