ATIS0 SD Read

Name: ATIS0 SD Read
Creator: Linguistic Data Consortium
Published: 2021-07-01 16:34:43
License: 暂无描述

DataCite Commons2021-07-01 更新2025-04-16 收录

下载链接：

https://catalog.ldc.upenn.edu/LDC93S4B-3

下载链接

链接失效反馈

官方服务：

资源简介：

<a href="http://catalog.ldc.upenn.edu/LDC93S4A" rel="nofollow">LDC93S4A</a> - Complete ATIS0 corpus <a href="http://catalog.ldc.upenn.edu/LDC93S4B" rel="nofollow">LDC93S4B</a> - ATIS0 Pilot <a href="http://catalog.ldc.upenn.edu/LDC93S4B-2" rel="nofollow">LDC93S4B-2</a> - ATIS0 Read LDC93S4B-3 - ATIS0 SD-Read The ATIS0 Corpus totals six CD-ROMs: one with spontaneous data from 36 speakers; one with read versions of the data from 20 of those speakers, along with some adaptation material; and four with extensive speaker dependent material from the ATIS domain, read by ten of the same speakers. All ATIS speech data is recorded at 16kHz sample rate, 16-bit quantization, from two different microphones, a close-talking (Sennheiser HMD414) and a desk-top (Crown PCC-160) model. The first disc (ATIS0 Pilot) contains spontaneous utterances elicited in a "Wizard-of-Oz" simulation, along with the relational database containing the travel information (excluding connecting flights). Thirty-six speakers produced a total of 912 utterances. The second disc (ATIS0 Read) contains "read" versions of the spontaneous utterances for 20 of the 36 speakers above, for a total of 478 productions. This is supplemented by a set of 40 "adaptation" sentences read by each of the 20 speakers. The third through the sixth discs (ATIS0 SD-Read) contain "read" speech in the ATIS domain for ten of the speakers on the first disc. They read a total of 3,171 utterances, or approximately 317 utterances per speaker. This data was collected for the purpose of training speaker-dependent speech recognition systems for the ATIS0 domain. Two of these four discs contain the close-talking (Sennheiser) microphone data and the other two contain corresponding data for the desk-top (Crown PCC-160) microphone. Thus there are 6,342 waveform files on the four discs.<h4>Update</h4> This publication has been condensed from 4 CDROM discs to a single DVDROM. The contents of each CD reside in separate directories that are organized identically to the original version. Portions © 1993 Trustees of the University of Pennsylvania

<a href="http://catalog.ldc.upenn.edu/LDC93S4A" rel="nofollow">LDC93S4A</a> —— 完整ATIS0语料库 <a href="http://catalog.ldc.upenn.edu/LDC93S4B" rel="nofollow">LDC93S4B</a> —— ATIS0 试点数据集 <a href="http://catalog.ldc.upenn.edu/LDC93S4B-2" rel="nofollow">LDC93S4B-2</a> —— ATIS0 朗读版数据集 LDC93S4B-3 —— ATIS0 SD-Read（说话人相关朗读）数据集 本ATIS0语料库共计包含6张光盘：其中1张收录了36位说话人的自发语音数据；1张收录了上述36位说话人中20位的该数据的朗读版本，辅以部分适配素材；剩余4张则收录了ATIS领域内大量的说话人相关语音素材，由上述10位说话人朗读完成。 所有ATIS语音数据均以16kHz采样率、16比特量化精度进行录制，采集自两款不同麦克风：分别为近距头戴式麦克风（森海塞尔Sennheiser HMD414）与桌面式麦克风（皇冠Crown PCC-160）。 首张光盘（ATIS0 Pilot）收录了基于“奥兹巫师”（Wizard-of-Oz）模拟范式采集的自发语音语句，以及包含航班出行信息（不含中转航班）的关系型数据库。本次数据集由36位说话人共录制912条语音语句。 第二张光盘（ATIS0 Read）收录了上述36位说话人中20位的前述自发语音语句的朗读版本，总计478条语音产出。此外该光盘还补充收录了由这20位说话人各自朗读的40条“适配”语句。 第三至第六张光盘（ATIS0 SD-Read）收录了首张光盘中10位说话人在ATIS领域内的朗读语音数据。该部分共计3171条语音语句，平均每位说话人录制约317条。本数据集专为训练ATIS0领域内的说话人相关语音识别系统而采集。四张光盘中，两张收录了近距麦克风（森海塞尔）采集的数据，剩余两张则收录了桌面式麦克风（皇冠PCC-160）对应的采集数据。因此四张光盘共计包含6342个波形音频文件。 <h4>更新说明</h4> 本出版物已从4张CD-ROM压缩为单张DVD-ROM，每张原CD的内容均存储于独立目录中，目录结构与原始版本保持一致。 部分内容 © 1993 宾夕法尼亚大学托管委员会

提供机构：

Linguistic Data Consortium

创建时间：

2020-11-30

5,000+

优质数据集

54 个

任务类型

进入经典数据集