five

RATS Low Speech Density

收藏
DataCite Commons2025-06-03 更新2024-07-13 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2024S03
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3> <p>RATS Low Speech Density was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 87 hours of English, Levantine Arabic, Farsi, Pashto and Urdu speech and non-speech samples. The recordings were assembled by concatenating a randomized selection of speech, communications systems sounds, and silence. This corpus was created to measure false alarm performance in RATS speech activity detection systems.</p> <p>The goal of the RATS (Robust Automatic Transcription of Speech) program was to develop human language technology systems capable of performing speech detection, language identification, speaker identification and keyword spotting on the severely degraded audio signals that are typical of various radio communication channels, especially those employing various types of handheld portable transceiver systems. To support that goal, LDC assembled a system for the transmission, reception and digital capture of audio data that allowed a single source audio signal to be distributed and recorded over eight distinct transceiver configurations simultaneously. Those configurations included three frequencies -- high, very high and ultra high -- variously combined with amplitude modulation, frequency hopping spread spectrum, narrow-band frequency modulation, single-side-band or wide-band frequency modulation. Annotations on the clear source audio signal, e.g., time boundaries for the duration of speech activity, were projected onto the corresponding eight channels recorded from the radio receivers.</p> <h3>Data</h3> <p>The source audio was extracted from RATS development and progress speech activity detection sets and from RATS keyword spotting development data. It consists of conversational telephone speech recordings collected by LDC: (1) data collected for the RATS program from Levantine Arabic, Farsi, Pashto and Urdu speakers; and (2) material from the&nbsp;<a href="http://catalog.ldc.upenn.edu/LDC2004S13">Fisher English (LDC2004S13</a>,&nbsp;<a href="http://catalog.ldc.upenn.edu/LDC2005S13">LDC2005S13</a>) and&nbsp;<a href="http://catalog.ldc.upenn.edu/LDC2007S02">Fisher Levantine Arabic telephone studies (LDC2007S02)</a>,&nbsp;<a href="http://catalog.ldc.upenn.edu/LDC2006S29">Levantine Arabic QT Training Data Set 5, Speech (LDC2006S29)</a>, and&nbsp;<a href="http://catalog.ldc.upenn.edu/LDC2014S01">CALLFRIEND Farsi Second Edition Speech (LDC2014S01)</a>.</p> <p>Non-speech samples were selected from communications systems sounds, including telephone network special information tones, radio selective calling signals, HF/VHF/UHF digital mode radio traffic, radio network control channel signals, two-way radio traffic containing roger beeps, and short duration shift-key modulated handset data transmissions.</p> <p>The data is divided into development, progress, and train sets, each containing their own subdirectories.</p> <p>All audio files are presented as single-channel, 16-bit PCM, 16000 samples per second; lossless FLAC compression is used on all files. When uncompressed, the files have "MS-WAV" (RIFF) file headers.</p> <p>A collection of tables describing the design and assembly of the source audio files is included in the documentation accompanying this release.</p> <h3>Sponsorship</h3> <p>This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. D10PC20016. The content does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred.</p>

### 引言 本数据集为RATS低语音密度(RATS Low Speech Density)数据集,由语言数据联盟(Linguistic Data Consortium, 简称LDC)开发,包含约87小时的英语、黎凡特阿拉伯语、波斯语、普什图语及乌尔都语语音与非语音样本。该录音通过随机选取语音、通信系统音效及静音片段拼接而成,旨在评测RATS语音活动检测(speech activity detection)系统的误报性能。 鲁棒语音自动转录(Robust Automatic Transcription of Speech, 简称RATS)项目的目标是开发人类语言技术系统,能够在各类无线通信信道(尤其是使用各类手持便携式收发机系统的信道)典型的严重退化音频信号上完成语音检测、语言识别、说话人识别及关键词检索。为支持该目标,LDC搭建了一套音频数据传输、接收与数字化采集系统,可将单源音频信号同时分发,并通过八种不同收发机构型进行录制。这些构型涵盖高、甚高、特高频三个频段,分别与调幅、跳频扩频、窄带调频、单边带或宽带调频组合使用。对清晰源音频信号的标注(如语音活动时长的时间边界)会被投影至八个对应无线电接收机录制的通道上。 ### 数据 源音频提取自RATS开发与进展语音活动检测数据集,以及RATS关键词检索开发数据,包含LDC采集的会话电话语音录音:(1) 为RATS项目采集的黎凡特阿拉伯语、波斯语、普什图语及乌尔都语语音数据;(2) 来自<a href="http://catalog.ldc.upenn.edu/LDC2004S13">Fisher英语(LDC2004S13</a>、<a href="http://catalog.ldc.upenn.edu/LDC2005S13">LDC2005S13</a>)、<a href="http://catalog.ldc.upenn.edu/LDC2007S02">Fisher黎凡特阿拉伯语电话研究数据集(LDC2007S02)</a>、<a href="http://catalog.ldc.upenn.edu/LDC2006S29">黎凡特阿拉伯语QT训练数据集5 语音(LDC2006S29)</a>以及<a href="http://catalog.ldc.upenn.edu/LDC2014S01">CALLFRIEND波斯语第二版语音(LDC2014S01)</a>的素材。 非语音样本选自通信系统音效,包括电话网络特殊信息音、无线电选呼信号、高频/甚高频/特高频数字模式无线电信务、无线网络控制信道信号、包含确认音的双向无线电信务,以及短时频移键控调制的手持机数据传输。 本数据集分为开发集、进展集与训练集,各包含独立子目录。 所有音频文件均为单通道、16位脉冲编码调制(PCM)、每秒16000个采样点;所有文件均采用无损FLAC压缩。未压缩时,文件带有"MS-WAV"(RIFF)文件头。 本发布附带的文档中包含描述源音频文件设计与组装的若干表格。 ### 资助信息 本材料基于美国国防高级研究计划局(Defense Advanced Research Projects Agency, 简称DARPA)在合同号D10PC20016下资助的工作成果。本内容不一定反映政府的立场或政策,不应视为获得官方背书。
提供机构:
Linguistic Data Consortium
创建时间:
2024-03-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作