five

1997 English Broadcast News Transcripts (HUB4)

收藏
DataCite Commons2021-07-01 更新2024-07-13 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC98T28
下载链接
链接失效反馈
官方服务:
资源简介:
<p><a href="http://catalog.ldc.upenn.edu/LDC98S71" rel="nofollow">LDC98S71</a> - Speech data LDC98T28 - Transcripts</p><br> <h3>Introduction</h3><br> <p>This publication has been prepared to serve as a supplement to the 1996 Broadcast News Speech collection (consisting of over 100 hours of similar recordings). The primary motivation for this collection is to provide additional training data for the DARPA "HUB4" Project on continuous speech recognition in the broadcast domain.</p><br> <h3>Data</h3><br> <p>This set of 18 CD-ROMs contains a total of 97 hours of recordings from radio and television news broadcasts, gathered between June 1997 and February 1998.</p><br> <p>Transcripts have been made of all recordings in this publication, manually time aligned to the phrasal level, annotated to identify boundaries between news stories, speaker turn boundaries and gender information about the speakers. The transcription conventions are described in the file "transcrp.doc" -- please note that this file describes the transcription methods by reference to text formatting conventions used internally by the LDC during the transcription process. The released version of the transcripts is in SGML format, comparable to the format that was used in the 1996 Broadcast News Speech transcriptions and there is accompanying documentation and an SGML DTD file, included with the transcription release.</p><br> <h3>Updates</h3><br> <p>There are no updates at this time.</p><br> <h3>Additional Licensing Instructions</h3><br> <p>This 'members-only' corpora is available to current members who can request the data at the listed reduced-license fee. Contact&nbsp;<a href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>&nbsp;for information about becoming a member.</p></br>
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作