five

CSLU: Multilanguage Telephone Speech Version 1.2

收藏
DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2006S35
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3> <p>The Multilanguage Telephone Speech corpus consists of telephone speech from 11 languages: English, Farsi, French, German, Hindi, Japanese, Korean, Mandarin, Spanish, Tamil, Vietnamese. The corpus contains fixed vocabulary utterances (eg. days of the week) as well as fluent continuous speech. The current release includes recorded utterances from about 2,052 speakers, for a total of about 38.5 hours of speech. Time-aligned phonetic transcriptions for 619 of the utterances are also included. </p><h3>Data</h3> <p>Each subject called the CSLU data collection system by dialing a toll-free number. An analog telephone line was connected to a Gradient Technologies box. Data from incoming calls were recorded by the Gradient box. The sampling rate was 8 khz and the files were stored in 16-bit linear format on a UNIX file system. Each utterance was recorded as a separate file.</p><h3>Samples</h3> <p>For an example of the data in this corpus, please listen to these audio samples in <a href="./desc/addenda/LDC2006S35_Tam.wav" rel="nofollow">Tamil</a> and <a href="./desc/addenda/LDC2006S35_Eng.wav" rel="nofollow">English</a>. </p></br> Portions © 1992, 2000, 2002 Center for Spoken Language Understanding, Oregon Health &amp; Science University, © 2006 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作