Call My Net 1
收藏DataCite Commons2025-06-03 更新2024-07-13 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2024S05
下载链接
链接失效反馈官方服务:
资源简介:
<h3>Introduction</h3>
<p>Call My Net 1 was developed by the Linguistic Data Consortium and contains 364 hours of conversational telephone speech in four languages (Tagalog, Cebuano, Cantonese and Mandarin) collected in 2015 from 221 native speakers located in the Philippines and China along with metadata and speaker demographic information. Recordings and data from this collection were used to support the <a href="https://www.nist.gov/itl/iad/mig/speaker-recognition-evaluation-2016">NIST 2016 Speaker Recognition Evaluation</a>.</p>
<h3>Data</h3>
<p>Speakers were recruited to make 10 telephone calls each to people within their existing social networks, using different handsets and under a variety of noise conditions. Speakers were connected through a robot operator to carry on casual conversations on topics of their choice.</p>
<p>All recordings were manually audited to confirm language and speaker requirements. The documentation for this release includes metadata about phone type, noise conditions and call quality. Speaker demographic information on year of birth, sex and native language is also included.</p>
<p>This corpus contains 2472 telephone recordings. Audio files are presented as 2-channel, 16-bit, 8 kHz, PCM FLAC.</p>
<p>Metadata and demographic information are presented in tab-delimited files.</p>
<h3>引言</h3><p>Call My Net 1 由语言数据联盟(Linguistic Data Consortium)开发,包含364小时的会话电话语音数据,涵盖他加禄语(Tagalog)、宿务语(Cebuano)、粤语(Cantonese)及普通话(Mandarin)四种语言,于2015年从菲律宾与中国境内的221名母语使用者中采集,同时附带元数据与说话者人口统计信息。该数据集的录音与数据曾用于支持<a href="https://www.nist.gov/itl/iad/mig/speaker-recognition-evaluation-2016">NIST 2016说话人识别评测</a>。</p><h3>数据</h3><p>研究人员招募每名说话者,使其向自身现有社交网络内的联系人拨打10通电话,通话过程中需使用不同型号的手持终端,并处于多种噪声环境下。说话者通过机器人接线员接入通话链路,可自主选择话题展开非正式对话。</p><p>所有录音均经过人工审核,以确认其语言类型与说话者身份符合采集规范。本次发布的配套文档包含设备型号、噪声环境与通话质量相关的元数据,同时附带说话者的出生年份、性别与母语等人口统计信息。</p><p>该语料库共计包含2472通电话录音,音频文件采用2声道、16位、8kHz的PCM FLAC格式存储。</p><p>元数据与人口统计信息以制表符分隔的文本文件格式提供。</p>
提供机构:
Linguistic Data Consortium
创建时间:
2024-05-17



