five

Air Traffic Control Complete

收藏
DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC94S14A
下载链接
链接失效反馈
官方服务:
资源简介:
<p>LDC94S14A - Complete ATC0 corpus <a href="http://catalog.ldc.upenn.edu/LDC94S14B" rel="nofollow">LDC94S14B</a> - ATC0 Logan International <a href="http://catalog.ldc.upenn.edu/LDC94S14C" rel="nofollow">LDC94S14C</a> - ATC0 Washington National <a href="http://catalog.ldc.upenn.edu/LDC94S14D" rel="nofollow">LDC94S14D</a> - ATC0 Dallas Fort Worth</p><br> <h3>Introduction</h3><br> <p>The Air Traffic Control Corpus (ATC0) is comprised&nbsp;of recorded speech for use in supporting research and development activities in the area of robust speech recognition in domains similar to air traffic control (several speakers, noisy channels, relatively small vocabulary, constrained languaged, etc.) The audio data is composed of voice communication traffic between various controllers and pilots.</p><br> <h3>Data</h3><br> <p>The audio files are 8 KHz, 16-bit linear sampled data, representing continuous monitoring, without squelch or silence elimination, of a single FAA frequency for one to two hours. There are also files which indicate the amplitude of the received AM carrier signal at 10 msec. intervals.</p><br> <p>Full transcripts, including the start and end times of each transmission, are provided for each audio file. Each flight is identified by its flight number.</p><br> <p>ATC0 consists of three subcorpora, one for each airport in which the transmissions were collected -- Dallas Fort Worth (DFW), Logan International (BOS) and Washington National (DCA). The complete set contains approximately 70 hours of controller and pilot transmissions collected via antennas and radio receivers which were located in the vicinity of the respective airports.</p><br> <p>Detailed information regarding the collection process and the equipment used can be found on in the files, "atc.doc" in the "doc" directories.</p><br> <p>The ATC0 Corpus was collected by Texas Instruments under contract to DARPA. It was produced by the National Institute of Standards and Technology for distribution by the Linguistic Data Consortium.</p><br> <h3>Samples</h3><br> <p>For an example of the data in this corpus, please examine the following files. The audio sample is in NIST Sphere format. Users should save this file rather than try to display it in the browser</p><br> <ul><br> <li><a href="desc/addenda/LDC94S14A.sph" rel="nofollow">Audio</a></li><br> <li><a href="desc/addenda/LDC94S14A.cdt" rel="nofollow">Carrier Detect</a></li><br> <li><a href="desc/addenda/LDC94S14A.txt" rel="nofollow">Transcripts</a>/</li><br> </ul><br> <h3>Updates</h3><br> <p>Relative to the CD-ROMs produced in 1994 by NIST, the sphere files were renamed with the .sph extension, instead of the .wav extension.</p></br>
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作