five

UGAkan-ImpairedSpeechData: A Dataset of Impaired Speech in the Akan Language

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/vc84vdw8tb
下载链接
链接失效反馈
官方服务:
资源简介:
The UGAkan-ImpairedSpeechData is a speech dataset from indigenous speakers of Akan with different forms of speech impairments. It contains audio descriptions of culturally relevant images. The dataset comprises 14,312 audio files and corresponding transcriptions equivalent to 50.01 hours. Recordings were done in different environments including Outdoor (7,706 audio files), Other (3,075), Indoor (2,254), Studio (982), and Car (295). The dataset is also categorized by aetiology and gender. Male speakers contributed 6,754 files equivalent to 19.02 hours, with the highest representation from individuals with Cerebral Palsy (2,881 files, 8.84 hours), followed by Stammering, Cleft, and Stroke. Female speakers contributed 7,558 files equivalent to 30.99 hours, with most recordings coming from individuals with Cerebral Palsy (4,835 files, 15.66 hours) and Stammering (2,574 files, 13.88 hours). Stroke data was recorded only from male speakers, while Cleft speech samples were collected from both genders, with a higher volume from males. In terms of duration, the audio files vary in length. The average audio length is 12.46 seconds, with a standard deviation of 7.71 seconds, indicating moderate variability. The majority of audio files range from 6.59s to 16.00s, suggesting a right-skewed distribution. The maximum duration is 60.08s, which exceeds the upper bound of the interquartile range and is likely an outlier.
创建时间:
2025-12-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作