"DHVANI-DLSE-KA: A Deep Learning-Ready Kannada Noisy Speech Dataset for Neural Speech Enhancement Using TRAD-MNSC Formulation."
收藏DataCite Commons2026-03-28 更新2026-05-03 收录
下载链接:
https://ieee-dataport.org/documents/dhvani-dlse-ka-deep-learning-ready-kannada-noisy-speech-dataset-neural-speech-enhancement
下载链接
链接失效反馈官方服务:
资源简介:
"DHVANI-DLSE-KA is a rigorously verified, deep learning-ready noisy speech dataset for the Kannada language (ISO 639-1: kn), developed at the Indian Institute of Technology Kharagpur as part of a PhD research programme on multilingual speech enhancement. The name is derived from the Sanskrit and Kannada word Dhvani (\u0ca7\u0ccd\u0cb5\u0ca8\u0cbf \u2014 sound, resonance, echo), rooted in the Indian aesthetic theory of Dhvani-siddh\u0101nta, which holds that the most profound meaning of language lies not in its literal words but in the resonant suggestion that echoes beyond them \u2014 a fitting tribute to the living voice of Kannada, one of India's six officially recognised classical languages. The dataset contains 64,000 WAV audio files, consisting of 32,000 clean\u2013noisy paired samples generated at two controlled Signal-to-Noise Ratio (SNR) levels: 10 dB and 20 dB. Noisy speech signals are produced by systematically mixing clean Kannada speech recordings with real-world environmental noise from the DEMAND acoustic noise database using the TRAD-MNSC mixing formulation with joint normalisation, ensuring precise SNR preservation across all pairs. All audio files are stored in 16 kHz mono 16-bit PCM WAV format, enabling direct compatibility with modern deep learning frameworks including PyTorch, TensorFlow, and JAX. To ensure dataset reliability, a six-stage verification pipeline was implemented, covering dataset structure validation, sampling-rate verification, SNR verification, perceptual quality evaluation, and acoustic differentiation analysis between clean and noisy signals. The verification process achieved 100% SNR compliance across all 32,000 pairs, with average PESQ scores of 2.05 at 10 dB and 2.95 at 20 dB, consistent with expected perceptual degradation characteristics. DHVANI-DLSE-KA provides a large-scale, fully verified Kannada clean\u2013noisy speech corpus designed for machine learning and deep learning research in speech enhancement, addressing the critical shortage of paired noisy speech datasets for Indian languages and enabling robust development and benchmarking of speech enhancement algorithms for the Dravidian language family."
提供机构:
IEEE DataPort
创建时间:
2026-03-28



