African News Corpus
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/6990608
下载链接
链接失效反馈官方服务:
资源简介:
This consist of a monolingual news corpus for 19 languages from various sources like VOA, BBC, isolezwe etc.
- The BBC corpus (except for Yoruba) was extracted from the AfriBERTa corpus, please cite the AfriBERTa paper if you use it. A big thank you to Kelechi Ogueji for providing this corpus
- The VOA corpus was extracted from the MOT corpus, please cite the MOT paper if you use it.
- The Isolezwe (xho, zul) was crawled as part of the Lacuna NER/POS project with Masakhane, please cite the MAFAND paper for that.
- The nya data was part of the AI4D paper.
- We thank Jonathan Mukiibi for providing the lug news corpus.
- If you use the corpus for amh, hau, ibo, kin, lug, luo, pcm, swa, wol, yor, please cite our MAFT paper. We provide a description of the sources in the paper.
创建时间:
2022-08-17



