The Consonant Challenge Corpus

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/3538222

下载链接

链接失效反馈

官方服务：

资源简介：

The Consonant Challenge Corpus provides a dataset to support human-machine comparisons of consonant recognition in quiet and noise. Twelve female and 12 male native English talkers contributed to the corpus. All speakers produced each of the 24 English consonants / b, d, g, p, t, k, s, ʃ, f, v, ð, θ, ʧ, z, ʒ, h, ʤ, m, n, ŋ, w, r, j, l / in nine vowel contexts consisting of all possible combinations of the three vowels / iː / (as in “beat”), / uː / (as in “boot”), and / æ / (as in “bat”). Each VCV was produced using both front and end stress (e.g. / ‘æ b æ / vs / æ b ‘æ /) giving a total of 24 (speakers) * 24 (consonants) * 2 (stress types) * 9 (vowel contexts) = 10368 tokens. Tokens are distributed into training, development and test sets for the purposes of automatic speech recognition experiments. The Consonant Challenge is described in this article: Cooke, M., Scharenborg, O. (2008), “The Interspeech 2008 Consonant Challenge”, Proceedings of Interspeech, Brisbane, Australia, September 2008. The distribution consists of the following elements: Technical description: readme Speech/noise waveforms train.zip contains noise-free training data test.zip contains the 7 test sets as well as practice items for perceptual tests, and MATLAB format files containing offsets identifying the time location of the speech token within the mixture test_binaural.zip contains 2-channel wavs with the speech and noise on separate channels (left=noise, right=speech), for test sets 2-7 (test set 1 is noise-free) dev.zip development set dev_binaural.zip is the 2-channel version of the development set Phoneme segmentation data handsegm.91.mlf.txt: 91 hand-segmented VCVs in HTK format. This set consists of at least three items per consonant in a context in which the first and the second vowel were identical, added to that were 19 randomly selected VCVs. segmentation_training.mlf.txt: automatically generated phoneme segmentation of the clean training material in HTK format segmentation_testsets.zip: zip file containing automatically generated phoneme segmentations of each test set in HTK format Automatic speech recognition asr.zip contais scripts and models

创建时间：

2024-07-22

5,000+

优质数据集

54 个

任务类型

进入经典数据集