Voice of America (VOA) Czech Broadcast News Audio
收藏DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2000S89
下载链接
链接失效反馈官方服务:
资源简介:
<h3>Introduction</h3><br>
<p>Voice of America (VOA) Czech Broadcast News Audio was developed by the Linguistic Data Consortium (LDC). Corresponding transcripts are contained in Voice of America (VOA) Czech Broadcast News Transcripts (<a href="../../../LDC2000T53">LDC2000T53</a>), the documentation for which is included with this release.</p><br>
<h3>Data</h3><br>
<p>Between February 9 and May 28, 1999, LDC collected approximately 30 hours of Czech broadcast audio from the Voice of America news service. The 62 data files presented in this corpus represent the audio of the daily broadcasts of 30-minute news programs.</p><br>
<p>Due to technical limitations in the hardware at LDC that was used to receive the VOA broadcasts via a satellite downlink, a number of files contain brief portions where the audio signal was interrupted. These interruptions typically yielded regions of complete silence that lasted less than two seconds and were scattered sparsely throughout an affected audio file. Additional markup was provided in the transcription texts to isolate the regions where these interruptions occurred.</p><br>
<p>The 62 audio files in this corpus are single-channel, 16 KHz, 16-bit linear SPHERE files.</p><br>
<h3>Samples</h3><br>
<p>For an example of the data in this corpus, please review this <a href="desc/addenda/LDC2000S89.wav" rel="nofollow">audio sample</a>.</p><br>
<h3>Updates</h3><br>
<p>There are no updates at this time.</p></br>
Portions © 2000 Trustees of the University of Pennsylvania
<h3>引言</h3><br><p>美国之音(Voice of America,以下简称VOA)捷克语广播新闻音频数据集由语言数据联盟(Linguistic Data Consortium,以下简称LDC)开发。对应的转写文本收录于《美国之音捷克语广播新闻转写文本集》(<a href="../../../LDC2000T53">LDC2000T53</a>),本发布包中包含该数据集的相关文档。</p><br><h3>数据说明</h3><br><p>1999年2月9日至5月28日期间,LDC从美国之音新闻服务中采集了约30小时的捷克语广播音频。本语料库包含的62个数据文件,均为每日播出的30分钟新闻节目的音频内容。</p><br><p>由于LDC当时用于通过卫星下行链路接收VOA广播的硬件存在技术限制,部分文件中存在短暂的音频信号中断片段。这类中断通常表现为持续时长不足2秒的纯静音区域,且零散分布于受影响的音频文件中。转写文本中已添加额外标记,用于隔离此类中断发生的区域。</p><br><p>本语料库中的62个音频文件均为单通道、16kHz采样率、16位线性编码的SPHERE格式音频文件。</p><br><h3>数据样例</h3><br><p>若需查看本语料库的数据样例,请点击<a href="desc/addenda/LDC2000S89.wav" rel="nofollow">音频样例</a>。</p><br><h3>更新说明</h3><br><p>目前暂无更新内容。</p></br>部分内容 © 2000 宾夕法尼亚大学托管会
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30



