five

1996 English Broadcast News Speech (HUB4)

收藏
DataCite Commons2021-07-01 更新2024-07-13 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC97S44
下载链接
链接失效反馈
官方服务:
资源简介:
<p>LDC97S44 - Speech data <a href="http://catalog.ldc.upenn.edu/LDC97S66" rel="nofollow">LDC97S66</a> - Dev and eval <a href="http://catalog.ldc.upenn.edu/LDC97T22" rel="nofollow">LDC97T22</a> - Transcripts</p><br> <h3>Introduction</h3><br> <p>The 1996 Broadcast News Speech Corpus contains a total of 104 hours of broadcasts from ABC, CNN and CSPAN television networks and NPR and PRI radio networks with corresponding transcripts. The primary motivation for this collection is to provide training data for the DARPA "HUB4" Project on continuous speech recognition in the broadcast domain.</p><br> <h3>Data</h3><br> <p>The speech files are available&nbsp;as a&nbsp;training data set, development data and evaluation data. The following programs are represented in this corpus:</p><br> <ul><br> <li>ABC Nightline</li><br> <li>ABC World Nightly News</li><br> <li>ABC World News Tonight</li><br> <li>CNN Early Edition</li><br> <li>CNN Early Prime News</li><br> <li>CNN Headline News</li><br> <li>CNN Prime Time News</li><br> <li>CNN The World Today</li><br> <li>CSPAN Washington Journal</li><br> <li>NPR All Things Considered</li><br> <li>NPR Marketplace</li><br> </ul><br> <p>Transcripts have been made of all recordings in this publication, manually time aligned to the phrasal level, annotated to identify boundaries between news stories, speaker turn boundaries and gender information about the speakers. The released version of the transcripts is in SGML format and there is accompanying documentation and an SGML DTD file, included with the transcription release. The transcripts are available via FTP.</p><br> <h3>Updates</h3><br> <p>There are no updates at this time.</p><br> <h3>Samples</h3><br> <ul><br> <li><a href="desc/addenda/LDC97S44.wav" rel="nofollow">audio</a>(MS Wave format).</li><br> </ul><br> <h3>Additional Licensing Instructions</h3><br> <p>This 'members-only' corpora is available to current members who can request the data at the listed reduced-license fee. Contact&nbsp;<a href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>&nbsp;for information about becoming a member.</p></br> Portions © 1996 American Broadcasting Company, Inc., Cable News Network, LP, LLLP, National Cable Satellite Corporation, National Public Radio, Inc., The University of Southern California, USC Radio and Marketplace, Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作