five

Switchboard Cellular Part 2 Audio

收藏
DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2004S07
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3><br> <p>Switchboard Cellular Part 2 Audio was devloped by the Linguistic Data Consortium (LDC) and consists of approximately 200 hours of English telephone conversations collected by LDC in 2000. The Switchboard cellular collection focused primarily on cellular phone technology of all service types. The goal was to target 200 subjects balanced by gender to participate in (10+) five-six minute conversations on cellular phones. The speech data was collected for research, development, and evaluation of automatic systems for speech-to-text conversion, talker identification, language identification and speech signal detection purposes.</p><br> <p>During the study period, LDC collected a total of 2,020 calls, or 4,040 sides (2,950 cellular) from 419 participants (2,405 female speakers, 1,635 male speakers) under varied environmental conditions.</p><br> <h3>Data</h3><br> <p>This release contains speech data files with documentation describing speaker information (sex, age, education, city and state where raised), call information (date, time, call duration, Personal Identification Numbers, topic), and audit information (channel quality, background noise). The documentation also contains reports on clipped files.</p><br> <p>Each speech file consists of a 1,024-byte ASCII-formatted Sphere header, followed by two-channel interleaved mu-law sample data. The mu-law samples represent the actual digital data transmission from the telephone service provider (MCI), as captured separately for each side of the telephone conversation by LDC's telephone collection platform. The header also indicates the caller_pin, callee_pin, topic_id, cellular service/handset information and speaker demographic information. The data files are not compressed.</p><br> <p>Other releases in this series include:</p><br> <p>Switchboard Cellular Part 1 Audio (<a href="../../../LDC2001S13">LDC2001S13</a>)</p><br> <p>Switchboard Cellular Part 1 Transcribed Audio (<a href="../../../LDC2001S15">LDC2001S15</a>)</p><br> <p>Switchboard Cellular Part 1 Transcription (<a href="../../../LDC2001T14">LDC2001T14</a>)</p><br> <h3>Sample</h3><br> <p>Please examine this <a href="desc/addenda/LDC2004S07.wav" rel="nofollow">example audio file</a> to review a sample of this corpus.</p><br> <h3>Updates</h3><br> <p>There are no updates available at this time.</p></br> Portions © 2000, 2004 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作