CraneAILabs/waxal-lug-clean

Name: CraneAILabs/waxal-lug-clean
Creator: CraneAILabs
Published: 2026-03-15 17:54:53
License: 暂无描述

Hugging Face2026-03-15 更新2026-04-05 收录

下载链接：

https://hf-mirror.com/datasets/CraneAILabs/waxal-lug-clean

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - lug license: cc-by-4.0 pretty_name: Waxal Luganda TTS (Cleaned) tags: - tts - text-to-speech - luganda - african-languages - vits - mms - audio - speech dataset_info: features: - name: id dtype: string - name: speaker_id dtype: string - name: text dtype: string - name: locale dtype: string - name: gender dtype: string - name: audio dtype: audio splits: - name: train num_examples: 1608 - name: validation num_examples: 211 - name: test num_examples: 205 source_datasets: - google/WaxalNLP task_categories: - text-to-speech size_categories: - 1K<n<10K --- # Waxal Luganda TTS (Cleaned) A cleaned version of the Luganda TTS subset from [Google's WaxalNLP dataset](https://huggingface.co/datasets/google/WaxalNLP), preprocessed for fine-tuning text-to-speech models. ## What changed from the original? The original Waxal recordings contain **click/pop artifacts** at the start and end of audio clips (likely from the recording equipment). These transients degrade TTS model quality during fine-tuning. This dataset applies **Silero VAD (Voice Activity Detection)** to precisely detect speech boundaries and trim non-speech regions, removing the clicks while preserving all spoken content. ### Cleaning pipeline | Step | Detail | |------|--------| | VAD model | [Silero VAD](https://github.com/snakers4/silero-vad) | | Speech threshold | 0.3 | | Padding | 30ms before/after detected speech | | Fade in/out | 3ms (prevents new edge artifacts) | | Min speech duration | 100ms | | Min silence duration | 30ms | ## Dataset details | Split | Samples | |-------|---------| | Train | 1,608 | | Validation | 211 | | Test | 205 | | **Total** | **2,024** | - **Language:** Luganda (lug) - **Sample rate:** 48,000 Hz - **Speakers:** 8 (4 male, 4 female) - **License:** CC-BY-4.0 (inherited from WaxalNLP) ## Speaker distribution | Speaker | Gender | Samples | |---------|--------|---------| | 1 | Male | 209 | | 2 | Male | 214 | | 3 | Male | 201 | | 4 | Female | 192 | | 5 | Female | 203 | | 6 | Female | 192 | | 7 | Male | 201 | | 8 | Male | 196 | ## Usage ```python from datasets import load_dataset ds = load_dataset("Cal3bd3v/waxal-lug-clean") # Access a sample sample = ds["train"][0] print(sample["text"]) # "Onaasaala e Juma ku Lwokutaano luno?" print(sample["audio"]) # {'array': array([...]), 'sampling_rate': 48000} ``` ### Fine-tuning MMS-TTS ```bash python train_tts.py --dataset-name Cal3bd3v/waxal-lug-clean --text-column text ``` ## Citation If you use this dataset, please cite the original WaxalNLP paper: ```bibtex @article{waxal2025, title={WAXAL: A Large-Scale Multilingual African Language Speech Corpus}, author={Google Research}, journal={arXiv preprint arXiv:2602.02734}, year={2025} } ``` ## Acknowledgments - **Original data:** [Google WaxalNLP](https://huggingface.co/datasets/google/WaxalNLP), collected by Makerere University and partners - **Cleaning:** Silero VAD for speech boundary detection

提供机构：

CraneAILabs

5,000+

优质数据集

54 个

任务类型

进入经典数据集