five

1997 Spanish Broadcast News Transcripts (HUB4-NE)

收藏
DataCite Commons2021-07-01 更新2024-07-13 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC98T29
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3><br> <p>This corpus contains a portion of the acoustic data designated as the training set for the 1997 DARPA HUB4 Spanish Benchmark. It contains speech and transcripts of 30 hours of broadcast news from the following sources: Televisa, Univision and VOA.</p><br> <p>Corresponding speech data is released as 1997 Spanish Broadcast News Speech (HUB4-NE) (<a href="http://catalog.ldc.upenn.edu/LDC98S74" rel="nofollow">LDC98S74</a>)</p><br> <h3>Data</h3><br> <p>All acoustic files are in NIST SPHERE format, without compression. The sample data are 16-bit linear PCM, 16-KHz sample frequency, single channel. Most files contain 30 minutes of recorded material, and some contain 60 or 120 minutes (approximately); the sampling format requires roughly two megabytes (MB) per minute of recording, so the file sizes are typically around 60 MB, with some files ranging up to 120 or 240 MB.</p><br> <p>The transcripts are in SGML format, using the same markup conventions that have been applied to the other 1997 Broadcast News speech corpora (in English and Mandarin).</p><br> <h3>Samples</h3><br> <p>Please view this <a href="desc/addenda/LDC98T29.sgm">SGML sample</a>.</p><br> <h3>Updates</h3><br> <p>There are no updates at this time.</p></br> Portions © 1997 Televisa S.A. de C.V., © 1997 Univision Network Limited Partnership, © 1997, 1998 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作