Dagbani Wiki Text
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8186834
下载链接
链接失效反馈官方服务:
资源简介:
The Dagbani Sentences Dataset is a collection of sentences in the Dagbani language, a Gur language spoken by the Dagomba people of Northern Ghana. The dataset was obtained by scraping sentences from Wikipedia articles written in Dagbani.
Content: The dataset comprises a zip file containing a text file, with each line in the text file representing a sentence from an article on the Dagbani Wikipedia page. The text file is encoded in the UTF-8 encoding format, and covers a wide range of topics from folklore, legends, education, to politics and health, among others.
Source: The dataset was compiled by scraping sentences from Wikipedia pages written in Dagbani. The sentences were extracted using web scraping techniques, and the data were collected with proper respect for copyright and usage policies of the Wikimedia Foundation.
Use Cases: The Dagbani Sentences Dataset can be valuable for researchers, linguists, and natural language processing (NLP) practitioners interested in the study of the Dagbani language. Particularly, the dataset is best suited for language modelling (GPT, BERT).
创建时间:
2023-07-26



