CSLU: Foreign Accented English Release 1.2

DataONE2023-04-17 更新2024-06-08 收录

下载链接：

https://search.dataone.org/view/sha256:c5dfabbd867601e562c07b287e9fe37a5ab9032025e6ab7c187841ba9c9077cc

下载链接

链接失效反馈

官方服务：

资源简介：

Introduction This file contains documentation on CSLU: Foreign Accented English Release 1.2, Linguistic Data Consortium (LDC) catalog number LDC2006S38 and isbn 1-58563-392-5. CSLU: Foreign Accented English Release 1.2 consists of continuous speech in English by native speakers of 22 different languages: Arabic, Cantonese, Czech, Farsi, French, German, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Mandarin Chinese, Malay, Polish, Portuguese (Brazilian and Iberian), Russian, Swedish, Spanish, Swahili, Tamil and Vietnamese. The corpus contains 4925 telephone-quality utterances, information about the speakers' linguistic backgrounds and perceptual judgments about the accents in the utterances. The speakers were asked to speak about themselves in English for 20 seconds. Three native speakers of American English independently listened to each utterance and judged the speakers' accents on a 4-point scale: negligible/no accent, mild accent, strong accent and very strong accent. This corpus is intended to support the study of the underlying characteristics of foreign accent and to enable research, development and evaluation of algorithms for the identification and understanding of accented speech. Some of the files in this corpus are also contained in CSLU: 22 Languages Corpus, LDC2005S26. Samples For an example of the data in this corpus, please listen to this audio sample. Copyright Portions © 2000-2002 Center for Spoken Language Understanding, Oregon Health & Science University, © 2007 Trustees of the University of Pennsylvania

引言本文件为CSLU外国口音英语语料库1.2版（CSLU: Foreign Accented English Release 1.2）的说明文档，其语言数据联盟（Linguistic Data Consortium, LDC）编号为LDC2006S38，国际标准书号（ISBN）为1-58563-392-5。 CSLU外国口音英语语料库1.2版收录了22种母语使用者的英语连续语音数据，涵盖母语为阿拉伯语、粤语、捷克语、波斯语、法语、德语、印地语、匈牙利语、印尼语、意大利语、日语、韩语、普通话、马来语、波兰语、葡萄牙语（含巴西葡萄牙语与伊比利亚葡萄牙语）、俄语、瑞典语、西班牙语、斯瓦西里语、泰米尔语以及越南语的使用者。本语料库共包含4925条电话音质语音片段，同时附带说话者的语言背景信息，以及针对各片段口音的主观标注判断结果。录制过程中要求每位说话者用英语自述个人情况，时长为20秒。三位以美式英语为母语的标注者独立收听每一条语音片段，并按照4级评分量表对说话者的口音进行标注：无/可忽略口音、轻度口音、重度口音以及极强口音。本语料库旨在支撑外国口音内在特征的相关研究，并可为口音语音识别与理解相关算法的研发、评估提供支持。本语料库中的部分文件同时收录于《CSLU 22种语言语料库》（CSLU: 22 Languages Corpus, LDC2005S26）中。示例若需查看本语料库的数据示例，请收听以下音频样例。版权声明部分内容 © 2000-2002 俄勒冈健康与科学大学口语语言理解中心（Center for Spoken Language Understanding, Oregon Health & Science University），© 2007 宾夕法尼亚大学校董会。

创建时间：

2023-12-28

搜集汇总

数据集介绍

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集