Automatic speech recognition for childhood dysarthria (Choi et al., 2026)
收藏DataCite Commons2026-04-13 更新2026-05-03 收录
下载链接:
https://asha.figshare.com/articles/dataset/Automatic_speech_recognition_for_childhood_dysarthria_Choi_et_al_2026_/31397457/1
下载链接
链接失效反馈官方服务:
资源简介:
<b>Purpose: </b>Accurate assessment of speech intelligibility is critical for children with dysarthria secondary to cerebral palsy. Traditional assessment methods, such as human listeners’ orthographic transcription and perceptual ratings (e.g., of ease of understanding [EoU]), are time consuming or subjective. Automatic speech recognition (ASR) may provide a more efficient, objective alternative, but its use for assessing intelligibility in this population is unexamined. This study evaluated the potential of ASR for intelligibility assessment in children with dysarthria and identified the most appropriate ASR systems for approximating human listeners’ judgments.<b>Method: </b>Five ASR systems transcribed speech samples from 20 children with dysarthria. Additionally, 168 adult listeners provided orthographic transcriptions and EoU ratings. Word recognition rate (WRR) was used as the metric for calculating ASR and human listeners’ transcription accuracy. Spearman correlations were used to assess the relationship between ASR WRR and human WRR, as well as between ASR WRR and human EoU ratings.<b>Results: </b>The WRR yielded by four ASR systems (WhisperX-small, WhisperX-medium, WhisperX-large, and Google Cloud) showed strong correlations with human WRR, with WhisperX-medium demonstrating the strongest correlation. These four systems’ WRRs also exhibited moderate-to-strong correlations with EoU ratings, with Google Cloud ASR showing the strongest correlation. In contrast, the WRR of Wav2Vec2 demonstrated a weak correlation with both human WRR and EoU ratings.<b>Conclusions: </b>ASR shows promise for use in intelligibility assessment in children with dysarthria. Of the tested ASR systems, WhisperX-medium appears most promising for approximating human transcription accuracy, whereas Google Cloud ASR aligns best with perceptual ratings. Such differences in ASR performance highlight the need for careful system selection in clinical applications.<b>Supplemental Material S1.</b> Raw and multiple-comparison–adjusted <i>p</i>-values (Holm–Bonferroni) for 10 Spearman correlations between ASR word recognition rates (WRR, %) and human perceptual measures (Human WRR, ease of understanding; EoU).<b>Supplemental Material S2.</b> Word recognition rates (WRR, %) by speaker for ASR systems and human listeners, with age and dysarthria severity.Choi, J., Moya-Galé, G., Hwang, K., Hirschberg, J., & Levy, E. S. (2026). Automatic speech recognition for intelligibility assessment in children with dysarthria<i>. Journal of Speech, Language, and Hearing Research,</i><i> </i><i>69</i>(4), 1438–1454. https://doi.org/10.1044/2025_JSLHR-25-00562
提供机构:
ASHA journals
创建时间:
2026-02-26



