"SWARABANGLA-DLSE-BN: A Deep Learning-Ready Bengali Noisy Speech Dataset for Speech Enhancement Using TRAD-MNSC Formulation"
收藏DataCite Commons2026-03-28 更新2026-05-03 收录
下载链接:
https://ieee-dataport.org/documents/swarabangla-dlse-bn-deep-learning-ready-bengali-noisy-speech-dataset-speech-enhancement
下载链接
链接失效反馈官方服务:
资源简介:
"SWARABANGLA-DLSE-BN is a rigorously verified, deep learning-ready noisy speech dataset for the Bengali language (ISO 639-1: bn), developed at the Indian Institute of Technology Kharagpur as part of a PhD research programme on multilingual speech enhancement. The name combines two words \u2014 Swara (\u09b8\u09cd\u09ac\u09b0: sound, note, voice \u2014 the fundamental unit of musical and spoken expression in Indian classical tradition) and Bangla (\u09ac\u09be\u0982\u09b2\u09be: the self-name of the Bengali language and its people) \u2014 forming Swarabangla (\u09b8\u09cd\u09ac\u09b0\u09ac\u09be\u0982\u09b2\u09be): the voice of Bangla, a tribute to one of the most widely spoken languages in the world and a language whose speakers gave their lives in the Bengali Language Movement of 21 February 1952, demanding the right to be heard in their mother tongue \u2014 a date now commemorated globally as International Mother Language Day. Bengali carries within it the literary legacies of Rabindranath Tagore, the first non-European Nobel Laureate in Literature, and Kazi Nazrul Islam, the revolutionary voice of the Bengali people. The dataset contains 64,000 WAV audio files, consisting of 32,000 clean\u2013noisy paired samples generated at two controlled Signal-to-Noise Ratio (SNR) levels: 10 dB and 20 dB. Noisy speech signals are produced by systematically mixing clean Bengali speech recordings with real-world environmental noise from the DEMAND acoustic noise database using the TRAD-MNSC mixing formulation with joint normalisation, ensuring precise SNR preservation across all pairs. All audio files are stored in 16 kHz mono 16-bit PCM WAV format, enabling direct compatibility with modern deep learning frameworks including PyTorch, TensorFlow, and JAX. To ensure dataset reliability, a six-stage verification pipeline was implemented, covering dataset structure validation, sampling-rate verification, SNR verification, perceptual quality evaluation, and acoustic differentiation analysis between clean and noisy signals. The verification process achieved 100% SNR compliance across all 32,000 pairs, with average PESQ scores of 2.05 at 10 dB and 2.95 at 20 dB, consistent with expected perceptual degradation characteristics. SWARABANGLA-DLSE-BN provides a large-scale, fully verified Bengali clean\u2013noisy speech corpus designed for machine learning and deep learning research in speech enhancement, addressing the critical shortage of paired noisy speech datasets for Indian languages and enabling robust development and benchmarking of speech enhancement algorithms for the Bengali-speaking communities of West Bengal, Bangladesh, and the broader Bengali diaspora worldwide."
提供机构:
IEEE DataPort
创建时间:
2026-03-28



