five

British-English SpeechDat-Car

收藏
catalog.elra.info2007-02-22 更新2025-01-22 收录
下载链接:
https://catalog.elra.info/en-us/repository/browse/ELRA-S0131/
下载链接
链接失效反馈
官方服务:
资源简介:
The British English SpeechDat-Car database contains the recordings of 300 British English speakers from 6 different regions (170 males, 130 females), recorded over the GSM telephone network, in a car. This database is partitioned into 115 CDs (DVDs are also available).The speech data files are in two formats. Four of the 5 microphones were recorded on the computer in the boot of the car. The speech data are stored as sequences of 16 kHz, 16 bit and uncompressed. The fifth microphone was connected to the cell phone, and was recorded on a remote machine. The data are stored as sequences of 8 kHz 8 bit A-law. Each signal file is accompanied by an ASCII SAM label file which contains the relevant descriptive information.This speech database was validated by SPEX (the Netherlands) to assess its compliance with the SpeechDat-Car format and content specifications.Each speaker uttered the following items: * 2 voice activation keywords * 1 sequence of 10 isolated digits * 7 connected digits (1 sheet number -5 digits, 1 spontaneous telephone number, 3 read telephone numbers, 1 credit card number –14/16 digits, 1 PIN code -6 digits) * 3 dates (1 spontaneous date e.g. birthday, 1 prompted date, 1 relative or general date expression) * 2 word spotting phrases using an embedded application word * 4 isolated digits * 7 spelled words (1 spontaneous e.g. own forename or surname, 1 directory city name, 4 real word/name, 1 artificial name for coverage) * 1 money amount * 1 natural number * 7 directory assistance names (1 spontaneous e.g. own forename or surname, 1 city of birth/growing up, 2 most frequent cities, 2 most frequent company/agency, 1 "forename surname") * 9 phonetically rich sentences * 2 time phrases (1 spontaneous time of day, 1word style time phrase) * 4 phonetically rich words * 67 application words (13 mobile phone application words, 22 IVR function keywords, 32 car products keywords) * 2 additional language dependent keywords * Prompts for spontaneous speechThe following age distribution has been obtained: 119 speakers are between 16 and 30, 109 speakers are between 31 and 45, 57 speakers are between 46 and 60, and 15 speakers are over 60.A pronunciation lexicon with a phonemic transcription in SAMPA is also included.

英国英语语音数据集(SpeechDat-Car)收录了来自6个不同地区的300位英国英语发音者的录音,其中男性170位,女性130位。这些录音是在GSM电话网络下,于车内进行的。该数据集被划分为115张CD(亦提供DVD版本)。语音数据文件以两种格式存储。其中四个麦克风在车尾的计算机上记录,存储为16kHz、16位且未经压缩的序列。第五个麦克风连接至手机,并在远程机器上记录,存储为8kHz、8位A律的序列。每个信号文件均伴随一个ASCII格式的SAM标签文件,其中包含相关的描述性信息。该语音数据库经荷兰SPEX公司验证,以评估其与SpeechDat-Car格式和内容规范的合规性。每位发音者均需朗读以下内容: * 2个语音激活关键词 * 10个独立的数字序列 * 7个连续数字(包括1张5位数字的表格编号、1个自发的电话号码、3个读出的电话号码、1个14/16位数字的信用卡号码、1个6位数字的PIN码) * 3个日期(包括1个自发的日期,如生日、1个提示日期、1个相对或一般日期表达) * 2个使用嵌入式应用程序词汇的词定位短语 * 4个独立的数字 * 7个拼写词汇(包括1个自发的,如自己的名字或姓氏、1个目录城市名称、4个真实词汇/名称、1个用于覆盖的人工名称) * 1个货币金额 * 1个自然数 * 7个目录辅助名称(包括1个自发的,如自己的名字或姓氏、1个出生或成长的城市、2个最常访问的城市、2个最常访问的公司/机构、1个“名字 姓氏”) * 9个语音丰富的句子 * 2个时间短语(包括1个自发的白天时间、1个词汇风格的时间短语) * 4个语音丰富的单词 * 67个应用程序词汇(包括13个移动应用程序词汇、22个IVR功能关键词、32个汽车产品关键词) * 2个额外的语言相关关键词 * 自发言语的提示。 所获得的年龄分布如下:119位发音者年龄在16至30岁之间,109位发音者年龄在31至45岁之间,57位发音者年龄在46至60岁之间,以及15位发音者年龄超过60岁。此外,还包括一个带有SAMPA音位转写的发音词典。
提供机构:
ELRA Catalogue of Language Resources
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作