five

Danish SpeechDat-Car - Full database

收藏
catalog.elra.info2007-02-22 更新2025-01-22 收录
下载链接:
https://catalog.elra.info/en-us/repository/browse/ELRA-S0132_01/
下载链接
链接失效反馈
官方服务:
资源简介:
The Danish SpeechDat-Car comprises the recordings of 300 Danish speakers from 5 different regions (162 males, 138 females), recorded over the GSM telephone network, and in a car. This database is partitioned into 15 DVDs (53 GB), plus 1 CD-ROM for e.g. non-signal files and documentation. The speech databases made within the SpeechDat-Car project were validated by SPEX, the Netherlands, to assess their compliance with the SpeechDat-Car format and content specifications.The speech data files are in two formats. Four of the microphones were recorded on the computer in the boot of the car. The speech data are stored as sequences of 16 kHz, 16 bit and uncompressed. The fifth microphone was connected to the cell phone, and was recorded on a remote machine, with compressed data stored as sequences of 8 bit A-law 8.kHz. Each signal file is accompanied by an ASCII SAM label file which contains the relevant descriptive information.Each speaker uttered the following items:2 voice activation keywords1 sequence of 10 isolated digits7 connected digits : 1 sheet number (5+ digits), 1 spontaneous telephone number, 3 read telephone numbers, 1 credit card number (14-16 digits), 1 PIN code (6 digits)3 dates : 1 spontaneous date (e.g. birthday), 1 prompted date, 1 relative or general date expression2 word spotting phrases using an application word (embedded)4 isolated digits7 spelled words : 1 spontaneous (own forename or surname), 1 spelling of directory city name, 4 real word/name, 1 artificial name for coverage1 money amount1 natural number7 directory assistance names : 1 spontaneous (own forename or surname), 1 city of birth / growing up (spontaneous), 2 most frequent cities, 2 most frequent company/agency, 1 "forename surname"9 phonetically rich sentences2 time phrases : 1 time of day (spontaneous), 1 time phrase (word style)4 phonetically rich words67 application words: 13 mobile phone application words, 22 IVR function keywords, 32 car products keywords2 additional language dependent keywordsPrompts for spontaneous speech2 additional keywords from a list of 10The following age distribution has been obtained: 84 speakers are between 18 and 30, 99 speakers are between 31 and 45, 98 speakers are between 46 and 60, and 19 speakers are over 60.A pronunciation lexicon with a phonemic transcription in SAMPA is also included.

丹麥語音數據集-Car收錄了來自5個不同地區的300名丹麥語者的語音錄音(男性162名,女性138名),在GSM電話網絡和汽車內進行錄音。該數據庫分為15張DVD(53GB)以及1張CD-ROM,用於存放例如非信號文件和文件說明。在SpeechDat-Car項目中製作的語音數據庫經由荷蘭的SPEX進行驗證,以確保其與SpeechDat-Car格式和內容規範的一致性。語音數據文件存儲為兩種格式。四個麥克風在車輛行李箱中的電腦上進行錄音。語音數據以16 kHz、16位且未壓縮的序列存儲。第五個麥克風連接到手機上,並在遠程機器上錄音,數據以8位A-law 8.kHz的序列進行壓縮。每個信號文件均配有一個ASCII SAM標籤文件,其中包含相關的描述信息。每位講者發出以下內容:2個語音激活關鍵詞、10個單獨數字的序列、7個連接數字(包括1個頁面編號、1個自發電話號碼、3個閱讀電話號碼、1個信用卡號碼(14-16位數)、1個PIN碼(6位數))、3個日期(包括1個自發日期、1個提示日期、1個相對或泛日期表達式)、2個使用應用詞的單詞搜索短語、7個單獨數字、7個拼寫的單詞(包括1個自發的(自己的名字或姓氏)、1個目錄城市名稱的拼寫、4個實際單詞/名稱、1個為覆蓋目的設計的人造名稱)、1個金錢數量、1個自然數、7個目錄輔助名稱(包括1個自發的(自己的名字或姓氏)、1個出生地/成長地點、2個最常見的城市、2個最常見的公司/機構、1個“名+姓”)、9個音韻豐富的句子、2個時間表達(包括1個自發的日間時間、1個詞風時間表達式)、67個應用詞:13個手機應用詞、22個IVR功能關鍵詞、32個汽車產品關鍵詞、2個附加語言依賴關鍵詞、自發語言的提示以及從10個關鍵詞列表中選取的附加關鍵詞。獲得的年齡分佈如下:84名講者年齡在18至30之間,99名講者年齡在31至45之間,98名講者年齡在46至60之間,以及19名講者年齡超過60。還包括一個帶有SAMPA音韻記音的發音詞典。
提供机构:
catalog.elra.info
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作