five

Kencorpus: Kenyan Languages Corpus

收藏
DataONE2024-01-24 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:a4c3bbd264a9c908647c9e8d5624dfbf13e390bd1331e6e7025e840ba68888ed
下载链接
链接失效反馈
官方服务:
资源简介:
This project collected text and speech corpora for Languages in Kenya. In KenCorpus project, three languages were strategically selected i.e. Kiswahili, Luhya, and Dholuo. The Luhya Language has several dialects. In the project, 3 dialects were chosen as a start: Lumarachi, Logooli and Lubukusi. Primary data was collected from the respective language communities, which also included indiginous stories and other narratives from student compositions, native language media stations, and publishers. This went beyond the conventional religious texts to include other genres of texts that made the corpus more representative of everyday language use in the communities. Text data : A total of 4442 texts were collected: 546 texts for Dholuo, 483 texts for Luhya-Lumarachi, 135 texts for Luhya-Lubukusu and 359 texts for Luhya-Logooli. Spontaneous Speech data: A total of 1,152 files were collected which total to 176hr 29min and 46sec of spontaneous speech data: 104 files (19hr 10min 57sec) for Swahili, 512 files (99hr 3min 8sec) for Dholuo, 138 files (15hr 37min 46sec) for Luhya-Lumarachi, 354 files (30hr 11min) for Luhya-Lubukusu and 44 files (12hr 26min 55sec) for Luhya-Logooli. Acknowledgement of data collectors: Kiswahili - Rose Felynix, Khalid Kitito, Dr. Benard Okal Luo - Jotham Ondu Ajiki, Dr. Jackline Okello, Jonathan Muga, Mercy Lavinca Oduoll Luhyia (Logooli) - Salano Odari, Dr. Phillip Lumwamu Luhyia (Bukusu) - Mactilda Nekesa Makana, Mulwale Martin Luhyia (Marachi) - Yonah Weunda
创建时间:
2024-03-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作