"TRAD-MNSC Dataset : Traditional Multilingual Noisy Speech Corpus for Speech Enhancement Research."
收藏DataCite Commons2026-01-30 更新2026-05-03 收录
下载链接:
https://ieee-dataport.org/documents/trad-mnsc-dataset-traditional-multilingual-noisy-speech-corpus-speech-enhancement
下载链接
链接失效反馈官方服务:
资源简介:
"The Traditional Multilingual Noisy Speech Corpus (TRAD-MNSC) is a comprehensive derived dataset specifically designed for evaluating traditional signal processing-based speech enhancement algorithms. This corpus comprises 5,600 audio files spanning seven Indian languages (Telugu, Tamil, Kannada, Malayalam, Bengali, Hindi, and Marathi) across four Signal-to-Noise Ratio (SNR) levels (5, 10, 15, and 20 dB), providing 2,800 clean-noisy speech pairs. The dataset was created by systematically mixing clean speech samples from the Kaggle dataset \u201cAudio Dataset with 10 Indian Languages\u201d with cafeteria environmentalnoise from the DEMAND database using precise SNR calibration. All audio files are sampled at 16 kHz with 16-bit resolution in mono channel format. This technical documentation presents the complete dataset structure, source attribution, mathematical formulations, noise characteristics analysis, quality control verification, and usage guidelines. The TRAD-MNSC dataset addresses the critical need for standardized, multilingual test corpora in speech processing research, particularly for traditional enhancement algorithms including spectral subtraction, Wiener filtering, Kalman filtering, and subspace methods."
提供机构:
IEEE DataPort
创建时间:
2026-01-30



