a3xrfgb/amharic-sentences-corpus
收藏Hugging Face2026-02-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/a3xrfgb/amharic-sentences-corpus
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
---

# Amharic Sentences Corpus V1.0
# Source: [Telegram](https://et.tgstat.com)
This 1.6 million Amharic sentences corpus reflects current Amharic usage as of December 20, 2025, and is designed for anyone interested in:
- Training Amharic-based LLMs
- Fine-tuning NLP models
- Building search, summarization, or generative systems in Amharic
The dataset is heavily cleaned and normalized, but like any serious LLM dataset, it still needs proper tokenization for pre-training.
I recommend using an Amharic-specific tokenizer such as: https://pypi.org/project/amharic-tokenizer/0.2.6
Another useful Amharic Sentences corpus by [@rasyosef](https://huggingface.co/datasets/rasyosef/amharic-sentences-corpus) & [@Addis AI](https://huggingface.co/datasets/addisai/wikipedia-amharic)
# How did I create this text corpus?
- I vibe coded a simple yet powerful script that creates a sentences from a .Json file that I downloaded from telegram channels.
If you Want to create your own text corpus, feel free to use my script.
https://github.com/a3xrfgb/HuggingFace_dataset_creator
This project is fully open-source & community-driven. If you’re building in NLP, AI research, or language technology, this is for you.
Use it. Improve it. Build on top of it.
## Topics
ethiopia / ethiopian / ethiopiandataset / ethiopianvision / ethiopianimages / ethiopianphotography / ethiopianvisuals / ethiopianculture / ethiopianart / ethiopiandigitalculture / ethiopianmachinelearning / ethiopiancomputervision / ethiopiangenerativeai / ethiopiandiffusion / ethiopianstreetphotography / ethiopianportraits / ethiopianlifestyle / ethiopianurbanculture / ethiopiancreativephotography / ethiopianvisualarchive / ethiopianmodernculture / ethiopiandigitalart / abyssinia / abyssiniandataset / abyssinianvision / abyssinianai / abyssinianimages / abyssinianvisuals / abyssinianculture / abyssinianphotography / abyssinianmosaic / abyssinianarchive / habesha / habeshaai / habeshavision / habeshaculture / habeshavisuals / sheger / shegervision / addisababa / addisvisuals / addisphotography / africanai / africancomputervision / eastafricanai / africanvisualdataset / amharic / ኢትዮጵያ / አማርኛ / ሀበሻ
提供机构:
a3xrfgb



