five

Korean Broadcast News Speech

收藏
DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2006S42
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3> <p>This data set consists of 18 audio files recorded by LDC in January 2000 and February 2000 from Voice of America (VOA) satellite radio news broadcasts in Korean. </p><h3>Data</h3> <p>The recordings, captured from a dedicated satellite receiver, are stored as 16-bit PCM, 16-kHz, single-channel, in NIST SPHERE format. The duration of each recording is either 30 minutes or 60 minutes, depending on the VOA broadcast schedule. The date (YYYYMMDD), start-time and end-time (HHMM, Eastern Standard Time) for each recording are indicated in its file name. The sample data is not compressed. </p><p>Transcripts for these recordings are available as a separate corpus from the LDC: Korean Broadcast News Transcripts, LDC2006T14. </p><h3>Samples</h3> <p>For an example of the data contained in this corpus, please listen to this <a href="./desc/addenda/LDC2006S42.wav" rel="nofollow">audio sample</a> (wav format). </p> </br> Portions © 2000, 2006 Trustees of the University of Pennsylvania

<h3>引言</h3> <p>本数据集包含18条音频文件,由语言数据联盟(Linguistic Data Consortium,LDC)于2000年1月至2000年2月期间,从美国之音(Voice of America,VOA)朝鲜语卫星广播新闻节目中录制获取。</p><h3>数据</h3> <p>本次采集的音频信号通过专用卫星接收机获取,以16位脉冲编码调制(16-bit PCM)、16千赫兹(16-kHz)采样率、单声道格式存储,采用NIST SPHERE格式封装。每条录音的时长为30分钟或60分钟,具体时长取决于美国之音的广播排期。每条录音的日期(格式为YYYYMMDD)、开始时间与结束时间(格式为HHMM,采用美国东部标准时间)均标注于文件名中。本数据集的样本数据未经过压缩处理。</p><p>上述录音的文本转录本可从语言数据联盟(LDC)获取,对应独立语料库为《朝鲜语广播新闻转录文本(LDC2006T14)》。</p><h3>样本示例</h3> <p>若需查看本语料库包含数据的示例,请收听该<a href="./desc/addenda/LDC2006S42.wav" rel="nofollow">音频样本</a>(wav格式)。</p></br>部分内容 © 2000、2006 宾夕法尼亚大学理事会
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作