five

somezay/everyayah-masjid-augmented

收藏
Hugging Face2026-04-05 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/somezay/everyayah-masjid-augmented
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 language: - ar pretty_name: EveryAyah Mosque-Environment Augmented size_categories: - 10K<n<100K task_categories: - automatic-speech-recognition tags: - quran - arabic - mosque - audio-augmentation - whisper - fine-tuning - asr --- # EveryAyah — Mosque-Environment Augmented Dataset Quran recitation audio with realistic mosque acoustic augmentation applied, designed for fine-tuning ASR (Automatic Speech Recognition) models that need to work in real mosque/masjid environments. ## Why This Dataset Exists Standard Quran recitation datasets (like EveryAyah) are recorded in studio conditions. ASR models trained on clean audio perform poorly in real mosques due to: - **Heavy low-pass filtering** — mosque rooms act as natural LPFs (Low-Pass Filters), with 97% of energy below 500 Hz - **Reverberation** — large open prayer halls create RT60 times of 1-2 seconds - **Low-frequency resonance** — bass buildup from room modes around 80-200 Hz - **Ambient noise** — air conditioning hum, shuffling, breathing This dataset applies empirically-matched augmentation based on spectral analysis of real mosque recordings (spectral rolloff measured at 478 Hz, -20 dB/octave above 500 Hz). ## Dataset Structure ``` masjid_medium/{reciter}/{SSS_AAA}.mp3 — moderate mosque conditions masjid_heavy/{reciter}/{SSS_AAA}.mp3 — harsh mosque conditions manifest_upload.jsonl — metadata (path, text, surah, ayah, reciter, augmentation) ``` ### File Naming - `SSS` = 3-digit surah number (001-114) - `AAA` = 3-digit ayah number ### Augmentation Presets | Preset | LPF Cutoff | Reverb RT60 | Wet Mix | Bass Boost | Gaussian SNR | Hum SNR | |--------|-----------|-------------|---------|------------|-------------|---------| | **masjid_medium** | 600 Hz | 1.2s | 0.28 | +4 dB @ 120 Hz | 28 dB | 28 dB | | **masjid_heavy** | 500 Hz | 1.7s | 0.40 | +5 dB @ 120 Hz | 25 dB | 25 dB | The augmentation chain applies (in order): 1. Reverb (synthetic impulse response) 2. Low-frequency resonance boost 3. Gaussian noise 4. Low-frequency hum (50/60 Hz harmonics) 5. Low-pass filter (simulating room acoustics) ### Reciters (from EveryAyah.com) | Reciter | Clips per augmentation | |---------|----------------------| | Alafasy_128kbps | 6,236 | | Husary_128kbps | 6,235 | | Abdul_Basit_Murattal_192kbps | 6,234 | | MaherAlMuaiqly128kbps | 6,236 | | Minshawy_Murattal_128kbps | 6,228 | **Total: ~62,338 augmented clips** across 2 augmentation levels. ## Clean Audio Clean (unaugmented) audio is not included — download directly from [EveryAyah.com](https://everyayah.com/). A lighter augmentation level (`masjid_light`: LPF@800Hz, SNR 32dB) was also generated but excluded from this upload as it's close enough to clean audio to be less useful for training. ## Manifest Format Each line in `manifest_upload.jsonl` is a JSON object: ```json { "path": "masjid_heavy/Alafasy_128kbps/001_001.mp3", "text": "بِسْمِ اللَّهِ الرَّحْمَنِ الرَّحِيمِ", "surah": 1, "ayah": 1, "reciter": "Alafasy_128kbps", "augmentation": "masjid_heavy" } ``` ## Intended Use - Fine-tuning Whisper (or other ASR models) for mosque environments - Training noise-robust Quran recitation recognizers - Benchmarking ASR robustness to room acoustics ## How It Was Made 1. Downloaded all ayah-level MP3s from [EveryAyah.com](https://everyayah.com/) for 5 reciters 2. Decoded to 16 kHz mono WAV 3. Applied augmentation chain calibrated against real mosque recordings 4. Re-encoded to 128 kbps MP3 Spectral calibration was done by comparing synthetic augmentation output against real Tarawih prayer recordings captured on a Galaxy A33 phone placed on the mosque floor. ## Citation If you use this dataset, please credit EveryAyah.com as the original audio source. ## License Audio content: recordings from EveryAyah.com. Augmentation and metadata: CC-BY-4.0.
提供机构:
somezay
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作