five

CALLFRIEND Mandarin Chinese-Mainland Dialect

收藏
DataCite Commons2021-07-01 更新2024-07-13 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC96S55
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3><br> <p>CALLFRIEND Mandarin Chinese-Mainland Dialect was developed by the Linguistic Data Consortium (LDC) and consists of approximately 24 hours of unscripted telephone conversations between native speakers of the Mandarin Chinese dialect spoken in mainland China.</p><br> <p>The CALLFRIEND series is a collection of telephone conversations in several languages conducted by LDC in support of language identification technology development. Languages covered in the collection include American English, Canadian French, Egyptian Arabic, Farsi, German, Hindi, Japanese, Korean, Mandarin Chinese, Spanish, Tamil and Vietnamese.</p><br> <p>An updated edition of this corpus is available as&nbsp;CALLFRIEND Mandarin Chinese-Mainland Dialect Second Edition (<a href="../../../LDC2018S09">LDC2018S09</a>).&nbsp;The second edition updates the audio files to wav format, simplifies the directory structure and adds documentation and metadata.</p><br> <h3>Data</h3><br> <p>The corpus consists of 60 unscripted telephone conversations, lasting between 5-30 minutes. The corpus also includes documentation describing speaker information (sex, age, education, callee telephone number) and call information (channel quality, number of speakers).</p><br> <p>For each conversation, both the caller and callee are native speakers of Mandarin Chinese from Mainland China. All calls are domestic and were placed inside the continental United States and Canada.</p><br> <p>Callers in the "Mainland" and "Taiwan" collections of CALLFRIEND Mandarin were identified primarily on the basis of specific attributes in their speech characteristic of geographic origin.</p><br> <h3>Updates</h3><br> <p>There are no updates at this time.</p></br> Portions © 1996 Trustees of the University of Pennsylvania

<h3>引言</h3><br><p>CALLFRIEND 普通话-大陆方言(Mandarin Chinese-Mainland Dialect)由语言数据联盟(Linguistic Data Consortium, LDC)开发,包含约24小时的无脚本电话对话,对话双方均为中国大陆普通话母语使用者。</p><br><p>CALLFRIEND系列是语言数据联盟(LDC)为支持语言识别技术研发而收集的多语种电话对话语料合集,涵盖的语言包括美式英语、加拿大法语、埃及阿拉伯语、波斯语、德语、印地语、日语、韩语、普通话、西班牙语、泰米尔语及越南语。</p><br><p>该语料库的更新版本为《CALLFRIEND 普通话-大陆方言 第二版》(<a href="../../../LDC2018S09">LDC2018S09</a>)。第二版将音频文件更新为wav格式,简化了目录结构,并补充了文档与元数据。</p><br><h3>数据</h3><br><p>该语料库包含60段无脚本电话对话,单段时长介于5至30分钟之间。语料库还附带描述说话者信息(性别、年龄、受教育程度、被叫方电话号码)以及通话信息(信道质量、说话者人数)的文档。</p><br><p>每段对话的主叫方与被叫方均为中国大陆普通话母语使用者,所有通话均为国内通话,且均在美国本土及加拿大境内拨打。</p><br><p>CALLFRIEND普通话语料库的“大陆”与“台湾”子集的说话者,主要通过其语音中体现地域特征的特定属性进行区分。</p><br><h3>更新说明</h3><br><p>目前暂无更新内容。</p><br>本部分内容©1996 宾夕法尼亚大学理事会
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
CALLFRIEND Mandarin Chinese-Mainland Dialect是一个包含60段普通话(中国大陆方言)非脚本电话对话的数据集,总时长约24小时,用于语言识别技术开发。数据集还包括说话者信息和通话质量的详细文档。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作