Boston University Radio Speech Corpus
收藏DataCite Commons2021-07-01 更新2024-07-13 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC96S36
下载链接
链接失效反馈官方服务:
资源简介:
<p>The Boston University Radio Speech Corpus was collected primarily to support research in text-to-speech synthesis, particularly generation of prosodic patterns. The corpus consists of professionally read radio news data, including speech and accompanying annotations, suitable for speech and language research. </p><p>The corpus includes speech from seven (four male, three female) FM radio news announcers associated with WBUR, a public radio station. The main radio news portion of the corpus consists of over seven hours of news stories recorded in the WBUR radio studio during broadcasts over a two year period. In addition, the announcers were also recorded in a laboratory at Boston University. In this, the lab news portion, the announcers read a total of 24 stories from the radio news portion. The announcers were first asked to read the stories in their non-radio style and then, 30 minutes later, to read the same stories in their radio style. </p><p>Each story read by an announcer was digitized in paragraph size units, which typically include several sentences. The files were digitized at a 16k Hz sample rate using a 16-bit A/D. The paragraphs were annotated with the orthographic transcription, phonetic alignments, part-of-speech tags and prosodic markers. The orthographic transcripts were generated by hand and include indication of where the speaker took a breath. The phonetic alignments and part-of-speech tags were generated automatically and hand corrected. The prosodic labels were marked by hand and are available only for a subset of the corpus. </p><p>A zipped compressed file <a href="desc/examples/bu_radio_ldc1996s36_sample.ZIP" rel="nofollow">example.zip</a> is available. Please be aware that this file is slightly larger than 1 Mb (1,278,998 bytes). An additional sample file, <a href="./desc/addenda/LDC1996S36.tgz" rel="nofollow">LDC1996.tgz</a> and <a href="./desc/addenda/LDC1996S36.wav" rel="nofollow">WAV sample</a> are also available. </p> </br>
Portions © 1996 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30



