five

USC Marketplace Broadcast News Transcripts

收藏
DataCite Commons2021-07-01 更新2024-07-13 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC99T36
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3> <p>The USC Marketplace Broadcast News Corpus contains approximately 40 hours of audio data, which was recorded daily between May 1, 1996 and September 18, 1996. Corresponding transcript files were created by Federal Document Clearing House and enhanced by the LDC to include: story boundaries, disfluency markers, and speaker and gender identification. In keeping with HUB4 style transcription conventions, LDC spelled all digit strings in standard orthography. Commercial and music segments, while a part of the audio publication, were excluded from the transcripts. The timestamps mark the beginning of each speaker turn relative to the beginning of the recording and are precise to the 100th of a second. Although the transcripts were created using HUB4 conventions, the second and third pass quality checks, typically required by government sponsored evaluation projects, were skipped. </p><h3>Data</h3> <p>The USC Marketplace recordings from the summer of 1996 were received on digital audio tapes (DATs) from the University of Southern California. LDC excluded from this set the roughly seven hours of broadcast that are currently included in the 1996 English Broadcast News publication. </p><p>Marketplace is produced by USC Radio in Los Angeles, a division of the University of Southern California. </p><h3>Updates</h3> There are no updates at this time. </br>
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作