five

African News Corpus

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/6990608
下载链接
链接失效反馈
官方服务:
资源简介:
This consist of a monolingual news corpus for 19 languages from various sources like VOA, BBC, isolezwe etc.  - The BBC corpus (except for Yoruba) was extracted from the AfriBERTa corpus, please cite the AfriBERTa paper if you use it. A big thank you to Kelechi Ogueji for providing this corpus - The VOA corpus was extracted from the MOT corpus, please cite the MOT paper if you use it.  - The Isolezwe (xho, zul) was crawled as part of the Lacuna NER/POS project with Masakhane, please cite the MAFAND paper for that.  - The nya data was part of the AI4D paper.  - We thank Jonathan Mukiibi for providing the lug news corpus.  - If you use the corpus for amh, hau, ibo, kin, lug, luo, pcm, swa, wol, yor, please cite our MAFT paper. We provide a description of the sources in the paper.
创建时间:
2022-08-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作