Amharic-English Parallel Corpus for Neural Machine Translation
收藏IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/amharic-english-parallel-corpus-neural-machine-translation
下载链接
链接失效反馈官方服务:
资源简介:
Amharic is the working language of Ethiopia and, owing to its Semitic characteristics, the language is known for its complex morphology. It is also an under-resourced language, presenting significant challenges for natural language processing tasks like machine translation. This study proposes a Transformer-based Amharic-to-English neural machine translation model that leverages character-level embeddings and integrates advanced regularization techniques, including dropout, L1, L2, and Elastic Net. By focusing on character-level embeddings, the model captures the intricate morphological patterns of Amharic and effectively handles out-of-vocabulary words. Our model significantly outperforms upon the previous state-of-the-art result in the Amharic-to-English neural machine translation benchmark, achieving a BLEU score of 40.59, which is 7\% higher than the previous state-of-the-art result. Among the regularization techniques tested, the integration of L2 regularization with dropout applied to the point-wise feed-forward network yielded the best translation performance. Additionally, the proposed model significantly reduces the parameter count from 75 million to just 5.4 million, demonstrating substantial computational efficiency while maintaining high accuracy. Extensive experiments demonstrated improvements in test accuracy, loss reduction, and translation fidelity compared to word-level embedding models. This research provides valuable insights into addressing the challenges of low-resource and morphologically complex languages, while also offering promising directions for future work, including the exploration of multilingual models, attention mechanism optimization, and the broader application of hybrid regularization techniques in the Transformer model architecture.
提供机构:
Asefa, Surafiel Habib; Assabie, Yaregal



