CSLU: S4X Release 1.2

Name: CSLU: S4X Release 1.2
Creator: Linguistic Data Consortium
Published: 2021-07-01 16:21:18
License: 暂无描述

DataCite Commons2021-07-01 更新2025-04-16 收录

下载链接：

https://catalog.ldc.upenn.edu/LDC2009S03

下载链接

链接失效反馈

官方服务：

资源简介：

<h3>Introduction</h3> <p> CSLU: S4X Release 1.2, Linguistic Data Consortium (LDC) catalog number LDC2009S03 and isbn 1-58563-523-5, was created by the Center for Spoken Language Understanding, Oregon Health and Science University (CSLU). The corpus consists of 36 speakers (22 male, 14 female) uttering 11 specified words. </p><p>The speakers repeated the following words six times on each of four channels: startrek, supernova, tektronix, generation, nebula, processing, singularity, 71523, abracadabra, sungeeta and computer. The four channels used were office phone, home phone, carbon microphone telephone and speaker phone. Each speech file has a corresponding time-aligned phoneme-level transcription (achieved using automatic forced alignment) and an automatically-generated world-level transcription. </p><p>Humans reviewed each utterance in two passes and classified it as good, bad, noisy or different. The results of this verification process are included in the /docs directory. </p><h3>Data</h3> <p>The data was recorded with the CSLU T1 digital data collection system. Each utterance is recorded as a separate file. These files were sampled at 8 khz 8-bit and stored as ulaw files. All of the data use the RIFF standard file format. This file format is 16-bit linearly encoded. </p> <h3>Samples</h3> <p>For an example of the data in this corpus, please listen to this recording of a subject speaking the word 'computer': <a href="./desc/addenda/LDC2009S03.wav" rel="nofollow">SD-1030-computer-t3-67</a>. </p> </br> Portions © 1996, 1998, 2000, 2002 Center for Spoken Language Understanding, Oregon Health and Science University, © 2009 Trustees of the University of Pennsylvania

提供机构：

Linguistic Data Consortium

创建时间：

2020-11-30

5,000+

优质数据集

54 个

任务类型

进入经典数据集