"BHARAVANI-DLSE-HI: A Deep Learning-Ready Hindi Noisy Speech Dataset for Neural Speech Enhancement Using TRAD-MNSC Formulation."

Name: "BHARAVANI-DLSE-HI: A Deep Learning-Ready Hindi Noisy Speech Dataset for Neural Speech Enhancement Using TRAD-MNSC Formulation."
Creator: IEEE DataPort
Published: 2026-03-28 07:47:59
License: 暂无描述

DataCite Commons2026-03-28 更新2026-05-03 收录

下载链接：

https://ieee-dataport.org/documents/bharavani-dlse-hi-deep-learning-ready-hindi-noisy-speech-dataset-neural-speech

下载链接

链接失效反馈

官方服务：

资源简介：

"BHARAVANI-DLSE-HI is a rigorously verified, deep learning-ready noisy speech dataset for the Hindi language (ISO 639-1: hi), developed at the Indian Institute of Technology Kharagpur as part of a PhD research programme on multilingual speech enhancement. The name is derived from two Sanskrit and Hindi words \u2014 Bhara (\u092d\u0930\u093e: filled, laden, resonant) and Vani (\u0935\u093e\u0923\u0940: speech, voice, divine utterance) \u2014 together forming Bharavani (\u092d\u0930\u093e\u0935\u093e\u0923\u0940), meaning the resonant, laden voice; a tribute to Hindi as the official language of India and the mother tongue of hundreds of millions across the Hindi Belt, carrying within it the literary voices of Kabir, Tulsidas, Mirabai, and Premchand. The name also draws an implicit connection to Bharata (\u092d\u0930\u0924), the ancient Sanskrit name for India, making BHARAVANI-DLSE-HI a dataset that honours both the voice and the nation. The dataset contains 64,000 WAV audio files, consisting of 32,000 clean\u2013noisy paired samples generated at two controlled Signal-to-Noise Ratio (SNR) levels: 10 dB and 20 dB. Noisy speech signals are produced by systematically mixing clean Hindi speech recordings with real-world environmental noise from the DEMAND acoustic noise database using the TRAD-MNSC mixing formulation with joint normalisation, ensuring precise SNR preservation across all pairs. All audio files are stored in 16 kHz mono 16-bit PCM WAV format, enabling direct compatibility with modern deep learning frameworks including PyTorch, TensorFlow, and JAX. To ensure dataset reliability, a six-stage verification pipeline was implemented, covering dataset structure validation, sampling-rate verification, SNR verification, perceptual quality evaluation, and acoustic differentiation analysis between clean and noisy signals. The verification process achieved 100% SNR compliance across all 32,000 pairs, with average PESQ scores of 2.05 at 10 dB and 2.95 at 20 dB, consistent with expected perceptual degradation characteristics. BHARAVANI-DLSE-HI provides a large-scale, fully verified Hindi clean\u2013noisy speech corpus designed for machine learning and deep learning research in speech enhancement, addressing the critical shortage of paired noisy speech datasets for Indian languages and enabling robust development and benchmarking of speech enhancement algorithms for the most widely spoken language of the Indian subcontinent."

提供机构：

IEEE DataPort

创建时间：

2026-03-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集