Speech Controlled Computing

Name: Speech Controlled Computing
Creator: Linguistic Data Consortium
Published: 2021-07-01 16:18:24
License: 暂无描述

DataCite Commons2021-07-01 更新2025-04-16 收录

下载链接：

https://catalog.ldc.upenn.edu/LDC2006S30

下载链接

链接失效反馈

官方服务：

资源简介：

<h3>Introduction</h3> This file contains documentation on Speech Controlled Computing, Linguistic Data Consortium (LDC) catalog number LDC2006S30 and ISBN 1-58563-380-1. The Speech Controlled Computing corpus was designed to support the development of small footprint, embedded ASR applications in the domain of voice control for the home. It consists of the recordings of 125 speakers of American English from four dialect regions, three age groups and two gender groups, pronouncing isolated words. The four primary dialect regions covered by the corpus are North, South, West and Midland as defined by Williams Labov's Atlas of North American English. The three primary age groups covered by the corpus are 18-29, 30-49 and 50+. The recordings were conducted in a sound-attenuated room at LDC with the AKG C4000B studio condenser microphone. The omni-directional mode of the C4000B was used. Each speaker read a randomized word list consisting of 2,100 words (100 distinct words appearing 21 times each). Speech utterances were digitized and recorded to a DAT, as well as to a hard disk drive via the Townshend DATLINK+ digital audio interface. Speech utterances were audited as they were recorded, and any utterances detected by the recorder that were not spoken clearly or correctly were re-recorded. This included extraneous clicks, coughs, sighs and breathing that may have corrupted the recorded words. Utterances that were spoken too soft or too loud were also re-recorded. The digitized utterances were automatically segmented and aligned to the word list. Then each utterance was audited and the segmentation was checked, and corrected if necessary, by an annotator using an auditing and segmenting tool developed by LDC. Finally, sound files containing individual utterances were generated using the alignment and segmentation information. The sound files for this corpus were created with 100 msec of silent time before and after each utterance. Any files that contained noticeable clipping were automatically removed. <h3>Samples</h3> For an example of this corpus, please listen to this audio <a href="desc/addenda/LDC2006S30.wav" rel="nofollow">sample</a> © 2003-2006 Trustees of the University of Pennsylvania

<h3>引言</h3> 本文件包含关于语音控制计算的说明文档，其语言数据联盟（Linguistic Data Consortium）编目号为LDC2006S30，ISBN为1-58563-380-1。 本语音控制计算语料库旨在支持面向家庭语音控制领域的轻量级嵌入式自动语音识别（Automatic Speech Recognition）应用开发。该语料库收录了125名美式英语使用者的发音录音，涵盖4个方言区域、3个年龄组与2个性别组，录制内容为孤立单词发音。语料库覆盖的4个主要方言区域为威廉·拉博夫《北美英语方言地图集》中定义的北部、南部、西部与中部方言区。语料库包含的3个主要年龄组分别为18-29岁、30-49岁以及50岁及以上群体。 录音工作在语言数据联盟的隔音室内完成，使用的设备为AKG C4000B 工作室电容麦克风，并启用其全指向拾音模式。每位受试者需朗读一份随机化单词表，该单词表共包含2100个单词（100个不同单词各重复出现21次）。语音片段经数字化处理后，同时录制至数字音频磁带（Digital Audio Tape）与通过Townshend DATLINK+ 数字音频接口连接的硬盘驱动器中。 语音片段在录制的同时即进行质检，凡录制时出现发音不清、不正确的片段，均需重新录制——此类不合格片段包括可能干扰录制单词的无关咔嗒声、咳嗽声、叹息声与呼吸声，以及音量过大或过小的语音片段。 数字化后的语音片段将自动进行切分，并与单词表进行对齐。随后，标注员使用语言数据联盟开发的质检与切分工具，对每一段语音片段进行审核，检查其切分结果并在必要时进行修正。 最终，利用对齐与切分信息生成包含单条语音片段的音频文件。本语料库的音频文件在每条语音片段前后均添加了100毫秒的静音时长。所有存在明显削波失真的音频文件均被自动移除。 <h3>示例</h3> 如需查看本语料库的示例，请收听此<a href="desc/addenda/LDC2006S30.wav" rel="nofollow">音频样本</a> © 2003-2006 宾夕法尼亚大学托管委员会

提供机构：

Linguistic Data Consortium

创建时间：

2020-11-30

5,000+

优质数据集

54 个

任务类型

进入经典数据集