UGAkan-ImpairedSpeechData: A Dataset of Impaired Speech in the Akan Language
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/vc84vdw8tb
下载链接
链接失效反馈官方服务:
资源简介:
The UGAkan-ImpairedSpeechData is a speech dataset from indigenous speakers of Akan with different forms of speech impairments. It contains audio descriptions of culturally relevant images. The dataset comprises 14,312 audio files and corresponding transcriptions equivalent to 50.01 hours. Recordings were done in different environments including Outdoor (7,706 audio files), Other (3,075), Indoor (2,254), Studio (982), and Car (295).
The dataset is also categorized by aetiology and gender. Male speakers contributed 6,754 files equivalent to 19.02 hours, with the highest representation from individuals with Cerebral Palsy (2,881 files, 8.84 hours), followed by Stammering, Cleft, and Stroke. Female speakers contributed 7,558 files equivalent to 30.99 hours, with most recordings coming from individuals with Cerebral Palsy (4,835 files, 15.66 hours) and Stammering (2,574 files, 13.88 hours). Stroke data was recorded only from male speakers, while Cleft speech samples were collected from both genders, with a higher volume from males.
In terms of duration, the audio files vary in length. The average audio length is 12.46 seconds, with a standard deviation of 7.71 seconds, indicating moderate variability. The majority of audio files range from 6.59s to 16.00s, suggesting a right-skewed distribution. The maximum duration is 60.08s, which exceeds the upper bound of the interquartile range and is likely an outlier.
创建时间:
2025-12-15



