CSLU: S4X Release 1.2
收藏DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2009S03
下载链接
链接失效反馈官方服务:
资源简介:
<h3>Introduction</h3> <p> CSLU: S4X Release 1.2, Linguistic Data Consortium (LDC) catalog number LDC2009S03 and isbn 1-58563-523-5, was created by the Center for Spoken Language Understanding, Oregon Health and Science University (CSLU). The corpus consists of 36 speakers (22 male, 14 female) uttering 11 specified words. </p><p>The speakers repeated the following words six times on each of four channels: startrek, supernova, tektronix, generation, nebula, processing, singularity, 71523, abracadabra, sungeeta and computer. The four channels used were office phone, home phone, carbon microphone telephone and speaker phone. Each speech file has a corresponding time-aligned phoneme-level transcription (achieved using automatic forced alignment) and an automatically-generated world-level transcription. </p><p>Humans reviewed each utterance in two passes and classified it as good, bad, noisy or different. The results of this verification process are included in the /docs directory. </p><h3>Data</h3> <p>The data was recorded with the CSLU T1 digital data collection system. Each utterance is recorded as a separate file. These files were sampled at 8 khz 8-bit and stored as ulaw files. All of the data use the RIFF standard file format. This file format is 16-bit linearly encoded. </p> <h3>Samples</h3> <p>For an example of the data in this corpus, please listen to this recording of a subject speaking the word 'computer': <a href="./desc/addenda/LDC2009S03.wav" rel="nofollow">SD-1030-computer-t3-67</a>. </p> </br>
Portions © 1996, 1998, 2000, 2002 Center for Spoken Language Understanding, Oregon Health and Science University, © 2009 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30



