CALLHOME Spanish Speech
收藏DataCite Commons2021-07-01 更新2024-07-13 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC96S35
下载链接
链接失效反馈官方服务:
资源简介:
<h3>Introduction</h3><br>
<p>The <a href="../../../Catalog/docs/LDC96S35/index.html" rel="nofollow">CALLHOME Spanish</a> corpus of telephone speech consists of 120 unscripted telephone conversations between native speakers of Spanish.</p><br>
<p>All calls, which lasted up to 30 minutes, originated in North America and were placed to international locations. Most participants called family members or close friends.</p><br>
<p>This corpus contains speech data files ONLY, along with the minimal amount of documentation needed to describe the contents and format of the speech files and the software packages needed to uncompress the speech data. The transcripts and documentation (<a href="http://catalog.ldc.upenn.edu/LDC96T17" rel="nofollow">LDC96T17</a>) are available separately, as is an associated lexicon (<a href="http://catalog.ldc.upenn.edu/LDC96L16" rel="nofollow">LDC96L16</a>).</p><br>
<h3>Samples</h3><br>
<p>Please listen to this <a href="desc/addenda/LDC96S35.sph">audio sample (SPH)</a>.</p><br>
<h3>Updates</h3><br>
<p>The "shorten" and "sphere" directories have been removed.</p><br>
<p>The sphere directory contained NIST "SPeech HEader REsources" (SPHERE): C-language source code libraries and utilities for manipulating NIST SPHERE-format waveform files.</p><br>
<p>The shorten directory contained files for Tony Robinson's "shorten" software for speech compression.</p><br>
<p>A more recent version of the SPHERE utilities is now available on the <a href="http://www.nist.gov/speech/tools/index.htm" rel="nofollow">NIST web site</a>; additional utilities for converting from SPHERE to other waveform file formats is also available at the <a href="http://www.ldc.upenn.edu/Using/" rel="nofollow">LDC web site.</a></p><br>
<p>10.10.2003: It has been brought to our attention that 16 sphere files (both from the train and devtest directories) were corrupted; the problem becomes apparent when trying to decompress the files using the w_decode utility. As of June 12th, 2018, the corrected version of these files are included with the downloadable corpus. Any new downloads after this date will contain the full, corrected speech.</p></br>
Portions © 1996 Trustees of the University of Pennsylvania
<h3>简介</h3><br>
<p>本<a href="../../../Catalog/docs/LDC96S35/index.html" rel="nofollow">呼叫家庭西班牙语(CALLHOME Spanish)</a>电话语音语料库,包含120段由西班牙语母语者发起的无脚本电话对话。</p><br>
<p>所有通话时长均不超过30分钟,发起地均为北美,拨打目的地为全球各地;绝大多数通话的参与者为致电亲友或密友。</p><br>
<p>本语料库仅包含语音数据文件,以及用于说明语音文件内容与格式的极简文档,与用于解压语音数据的软件包。其转写文本与文档(<a href="http://catalog.ldc.upenn.edu/LDC96T17" rel="nofollow">LDC96T17</a>)以及配套词典(<a href="http://catalog.ldc.upenn.edu/LDC96L16" rel="nofollow">LDC96L16</a>)均可单独获取。</p><br>
<h3>示例</h3><br>
<p>请收听此<a href="desc/addenda/LDC96S35.sph">语音示例(SPH)</a>。</p><br>
<h3>更新说明</h3><br>
<p>已移除"shorten"与"sphere"目录。</p><br>
<p>其中"sphere"目录内含美国国家标准与技术研究院(NIST)的语音头资源(SPeech HEader REsources,SPHERE):用于处理NIST SPHERE格式波形文件的C语言源代码库与工具集。</p><br>
<p>"shorten"目录则包含托尼·罗宾逊(Tony Robinson)开发的语音压缩工具"shorten"的配套文件。</p><br>
<p>更新版本的SPHERE工具集现已在<a href="http://www.nist.gov/speech/tools/index.htm" rel="nofollow">NIST官方网站</a>上线;此外,用于将SPHERE格式转换为其他波形文件格式的附加工具亦可在<a href="http://www.ldc.upenn.edu/Using/" rel="nofollow">LDC官方网站</a>获取。</p><br>
<p>2003年10月10日:有用户反馈,16个SPHERE格式文件(分别来自训练集与开发测试集目录)存在损坏问题,该问题在使用w_decode工具解压文件时会显现。截至2018年6月12日,可下载语料包已包含这些文件的修复版本,2018年6月12日之后的所有新下载包均包含完整且修复后的语音数据。</p></br>
Portions © 1996 宾夕法尼亚大学托管委员会
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
搜集汇总
数据集介绍

背景与挑战
背景概述
CALLHOME Spanish Speech数据集包含38小时的西班牙语母语者非脚本电话对话录音,共计120个对话,数据格式为8 kHz u-law SPHERE文件。该数据集是CALLHOME系列的一部分,主要用于语音识别、说话人识别和语言识别技术的研究。
以上内容由遇见数据集搜集并总结生成



