five

Sudanese dialect speech dataset

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/6721682
下载链接
链接失效反馈
官方服务:
资源简介:
This is speech dataset for the Sudanese dialect data been collected from YouTube videos represent the characteristics of the Sudanese dialect, mainly the middle of Sudan dialect -Khartoum in particular- and have some northern tendency, primarily two programs Hajj Muzakir and Dukkan Wad Elbaseer. Transcription is done manually by listening to the audio files repeatedly to write the captions for the collected conversations to make sure that every word is written as said by the speakers. Transcription is written without diacritics on the Arabic alphabet, in a manner that reflects the Sudanese way of speaking, therefore, any correction to the noticeable mistakes was not applied to get rid of any biases and make the data representative. The 'Dataset' subdirectory contains all the audio and text files for the corpus, the files organized based on program name 'hm_' for Hajj Muzakir program and 'wb_' Dukkan Wad Elbaseer, each filename follows three categories first two litters for the program name 'hm' or 'wb', second the number of the episode third the number of the clip, hm_01_0001.wav and wb_01_0001.wav represent first episode of each program and the first clip.
创建时间:
2022-12-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作