five

Amharic-English Parallel Corpus for Neural Machine Translation

收藏
IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/amharic-english-parallel-corpus-neural-machine-translation
下载链接
链接失效反馈
官方服务:
资源简介:
Amharic is the working language of Ethiopia and, owing to its Semitic characteristics, the language is known for its complex morphology. It is also an under-resourced language, presenting significant challenges for natural language processing tasks like machine translation. This study proposes a Transformer-based Amharic-to-English neural machine translation model that leverages character-level embeddings and integrates advanced regularization techniques, including dropout, L1, L2, and Elastic Net. By focusing on character-level embeddings, the model captures the intricate morphological patterns of Amharic and effectively handles out-of-vocabulary words. Our model significantly outperforms upon the previous state-of-the-art result in the Amharic-to-English neural machine translation benchmark, achieving a BLEU score of 40.59, which is 7\% higher than the previous state-of-the-art result. Among the regularization techniques tested, the integration of L2 regularization with dropout applied to the point-wise feed-forward network yielded the best translation performance. Additionally, the proposed model significantly reduces the parameter count from 75 million to just 5.4 million, demonstrating substantial computational efficiency while maintaining high accuracy. Extensive experiments demonstrated improvements in test accuracy, loss reduction, and translation fidelity compared to word-level embedding models. This research provides valuable insights into addressing the challenges of low-resource and morphologically complex languages, while also offering promising directions for future work, including the exploration of multilingual models, attention mechanism optimization, and the broader application of hybrid regularization techniques in the Transformer model architecture. 
提供机构:
Asefa, Surafiel Habib; Assabie, Yaregal
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作