The Korean speech recognition sentences (Song et al., 2023)
收藏DataCite Commons2023-09-27 更新2024-07-13 收录
下载链接:
https://asha.figshare.com/articles/dataset/The_Korean_speech_recognition_sentences_Song_et_al_2023_/24045582
下载链接
链接失效反馈官方服务:
资源简介:
<b>Purpose:</b> The aim of this study was to develop and validate a large Korean sentence set with varying degrees of semantic predictability that can be used for testing speech recognition and lexical processing.<b>Method:</b> Sentences differing in the degree of final-word predictability (predictable, neutral, and anomalous) were created with words selected to be suitable for both native and nonnative speakers of Korean. Semantic predictability was evaluated through a series of cloze tests in which native (n = 56) and nonnative (n = 19) speakers of Korean participated. This study also used a computer language model to evaluate final-word predictabilities; this is a novel approach that the current study adopted to reduce human effort in validating a large number of sentences, which produced results comparable to those of the cloze tests. In a speech recognition task, the sentences were presented to native (n = 23) and nonnative (n = 21) speakers of Korean in speech-shaped noise at two levels of noise.<b>Results:</b> The results of the speech-in-noise experiment demonstrated that the intelligibility of the sentences was similar to that of related English corpora. That is, intelligibility was significantly different depending on the semantic condition, and the sentences had the right degree of difficulty for assessing intelligibility differences depending on noise levels and language experience. Conclusions: This corpus (1,021 sentences in total) adds to the target languages available in speech research and will allow researchers to investigate a range of issues in speech perception in Korean.<b>Supplemental Material S1.</b> Full list of sentences.Song, J., Kim, B., Kim, M., & Iverson, P. (2023) “The Korean Speech Recognition Sentences: A Large Corpus for Evaluating Semantic Context and Language Experience in Speech Perception.” <i>Journal of Speech, Language, and Hearing Research</i>. Advance online publication. https://doi.org/10.1044/2023_JSLHR-23-00137
**研究目的**:本研究旨在开发并验证一套具备不同语义可预测性程度的大型韩语语句集,可用于语音识别与词汇加工相关测试。
**研究方法**:构建了尾词可预测性存在差异的语句,分为可预测、中性、异常三类,所选词汇适配韩语母语使用者与非母语使用者。研究通过一系列完形填空测试(cloze test)评估语义可预测性,共有56名韩语母语使用者及19名韩语非母语使用者参与测试。本研究同时采用计算机语言模型(computer language model)评估尾词可预测性——这是本研究为减少大量语句验证所需人力而采用的创新方法,所得结果与完形填空测试结果具有高度可比性。在语音识别任务中,研究团队将语句以两种强度的语音形噪声(speech-shaped noise)分别呈现给23名韩语母语使用者与21名韩语非母语使用者。
**研究结果**:噪声环境下的语音识别实验结果表明,本语句集的言语可懂度与相关英语语料库(corpora)的可懂度水平相近。具体而言,言语可懂度随语义条件不同呈现显著差异,且该语句集的难度设置合理,可用于评估不同噪声强度与语言经验水平下的可懂度差异。
**研究结论**:本语料库总计包含1021条语句,丰富了语音研究领域的目标语言资源,将为研究者探索韩语语音感知领域的各类议题提供支撑。
**补充材料S1**:完整语句列表。
Song, J., Kim, B., Kim, M., & Iverson, P. (2023) 《韩语语音识别语句集:用于评估语音感知中语义语境与语言经验的大型语料库》(Journal of Speech, Language, and Hearing Research),提前在线出版,https://doi.org/10.1044/2023_JSLHR-23-00137
提供机构:
ASHA journals
创建时间:
2023-08-28



