five

Paused Transcription Test (Lange & Matthews, 2020)

收藏
Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/g278w62zpg
下载链接
链接失效反馈
官方服务:
资源简介:
Paused Transcription Test (Lange & Matthews, 2020) This data is a listening test for English language learners. It was designed to measure lexical segmentation, or the ability to identify word boundaries in connected speech, as well as aural decoding, the ability to identify and recognize words in speech. In paused transcription tests, the test-taker listens to an audio recording and at irregular points in the recording, which correspond to the target items selected for the test, a pause is inserted. During this brief pause, the test-taker tries to transcribe the last phrase of three to five words which immediately preceded the pause. The recording resumes playback after the pause and the test-taker continues listening and transcribing the phrases heard before each pause. One aspect of the paused transcription testing format that is difficult to achieve with other tests of lexical segmentation is that it allows for the test-taker to apply their understanding of the aural co-text as well as their own background knowledge to the task of transcribing the target phrase (Field, 2008). By contrast, standard dictation tests or partial dictation tests usually require the listener to provide the target items without the benefit of hearing a significant amount of the target words’ surrounding co-text. The duration of the audio for each section of the paused transcription test was between 10 to 12 minutes. Each section of the test contained 12 target phrases of three words each for a total of 180 items. A 15-second pause was inserted in the audio text after the intonation unit containing each target phrase. All pauses were located in the speech of the native speaker in an effort to standardize the acoustic features of the target phrases. High-frequency vocabulary was almost exclusively used in order to minimize potential errors in lexical segmentation due to inadequate vocabulary knowledge. The vocabulary used in the test was analyzed for frequency in the combined COCA/BNC 1-25K corpus using the online computer program Compleat Web VP (Cobb, 2018). Results showed that 94.8% of the 5,278 tokens used in the test were within the first 1,000-word frequency band, 3.30% were in the second, 0.60% in the third, 0.30% in the fourth, 0.50% in the fifth, and 0.10% in the sixth 1,000-word frequency band with the remaining 0.44% of words not included in the corpora (i.e., offlist). A separate frequency analysis of the 60 target phrases showed that 97.2% of the 180 target words were within the first 1,000-word frequency band, 1.70% were in the second and 0.60% in the third. Only five target words were beyond the first 1,000-word frequency band. Cobb, T. Compleat Web VP v.2 [computer program]. Retrieved 01 Nov 2018 from https://www.lextutor.ca/vp/comp/ Field, J. (2008). Bricks or mortar: Which parts of the input does a second language listener rely on? TESOL Quarterly, 42(3), 411–432. https://doi.org/10.1002/j.1545-7249.2008.tb00139.x

暂停转录测试(Lange & Matthews, 2020) 本数据集为面向英语学习者的听力测试,旨在测评词汇切分能力——即识别连续语音中单词边界的能力——与听觉解码能力,即识别并辨认语音中单词的能力。在暂停转录测试中,受试者会收听一段音频录音,在与测试选定目标项对应的不规则节点处插入暂停。在短暂暂停期间,受试者需转录暂停前紧邻的3至5个单词组成的最后一个短语。暂停结束后录音恢复播放,受试者需继续收听并转录每次暂停前听到的短语。该测试格式具备一项其他词汇切分测试难以实现的优势:允许受试者结合对听觉上下文的理解与自身背景知识,完成目标短语的转录任务(Field, 2008)。与之相比,标准听写测试或分段听写测试通常要求听者直接提供目标词汇,无法借助目标词汇周围大量的上下文语境获益。 本次暂停转录测试各环节的音频时长为10至12分钟。每个测试环节包含12个目标短语,每个短语由3个单词组成,总计180个测试项。在包含每个目标短语的语调单元后,会在音频文本中插入15秒的暂停。所有暂停均设置在母语者的语音中,以标准化目标短语的声学特征。为尽可能降低因词汇知识不足导致的词汇切分失误,测试几乎全部采用高频词汇。测试所用词汇通过在线计算机程序Compleat Web VP(Cobb, 2018),结合COCA/BNC联合1-25K语料库进行频率分析。分析结果显示,本次测试所用的5278个Token(Token)中,94.8%位于前1000词频区间,3.30%位于第二区间,0.60%位于第三区间,0.30%位于第四区间,0.50%位于第五区间,0.10%位于第六1000词频区间,剩余0.44%的词汇未被纳入该语料库(即列表外词汇)。对60个目标短语的单独频率分析显示,180个目标词汇中97.2%位于前1000词频区间,1.70%位于第二区间,0.60%位于第三区间,仅有5个目标词汇超出前1000词频区间。 Cobb, T. Compleat Web VP v.2 [计算机程序]. 2018年11月1日检索自 https://www.lextutor.ca/vp/comp/ Field, J. (2008). Bricks or mortar: Which parts of the input does a second language listener rely on? TESOL Quarterly, 42(3), 411–432. https://doi.org/10.1002/j.1545-7249.2008.tb00139.x
创建时间:
2020-09-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作