five

The Consonant Challenge Corpus

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/3538222
下载链接
链接失效反馈
官方服务:
资源简介:
The Consonant Challenge Corpus provides a dataset to support human-machine comparisons of consonant recognition in quiet and noise. Twelve female and 12 male native English talkers contributed to the corpus. All speakers produced each of the 24 English consonants / b, d, g, p, t, k, s, ʃ, f, v, ð, θ, ʧ, z, ʒ, h, ʤ, m, n, ŋ, w, r, j, l / in nine vowel contexts consisting of all possible combinations of the three vowels / iː / (as in “beat”), / uː / (as in “boot”), and / æ / (as in “bat”). Each VCV was produced using both front and end stress (e.g. / ‘æ b æ / vs / æ b ‘æ /) giving a total of 24 (speakers) * 24 (consonants) * 2 (stress types) * 9 (vowel contexts) = 10368 tokens. Tokens are distributed into training, development and test sets for the purposes of automatic speech recognition experiments. The Consonant Challenge is described in this article: Cooke, M., Scharenborg, O. (2008), “The Interspeech 2008 Consonant Challenge”, Proceedings of Interspeech, Brisbane, Australia, September 2008. The distribution consists of the following elements:  Technical description: readme Speech/noise waveforms train.zip contains noise-free training data test.zip contains the 7 test sets as well as practice items for perceptual tests, and MATLAB format files containing offsets identifying the time location of the speech token within the mixture test_binaural.zip contains 2-channel wavs with the speech and noise on separate channels (left=noise, right=speech), for test sets 2-7 (test set 1 is noise-free) dev.zip development set dev_binaural.zip is the 2-channel version of the development set Phoneme segmentation data handsegm.91.mlf.txt: 91 hand-segmented VCVs in HTK format. This set consists of at least three items per consonant in a context in which the first and the second vowel were identical, added to that were 19 randomly selected VCVs. segmentation_training.mlf.txt: automatically generated phoneme segmentation of the clean training material in HTK format segmentation_testsets.zip: zip file containing automatically generated phoneme segmentations of each test set in HTK format  Automatic speech recognition asr.zip contais scripts and models
创建时间:
2024-07-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作