CraneAILabs/waxal-lug-clean
收藏Hugging Face2026-03-15 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/CraneAILabs/waxal-lug-clean
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- lug
license: cc-by-4.0
pretty_name: Waxal Luganda TTS (Cleaned)
tags:
- tts
- text-to-speech
- luganda
- african-languages
- vits
- mms
- audio
- speech
dataset_info:
features:
- name: id
dtype: string
- name: speaker_id
dtype: string
- name: text
dtype: string
- name: locale
dtype: string
- name: gender
dtype: string
- name: audio
dtype: audio
splits:
- name: train
num_examples: 1608
- name: validation
num_examples: 211
- name: test
num_examples: 205
source_datasets:
- google/WaxalNLP
task_categories:
- text-to-speech
size_categories:
- 1K<n<10K
---
# Waxal Luganda TTS (Cleaned)
A cleaned version of the Luganda TTS subset from [Google's WaxalNLP dataset](https://huggingface.co/datasets/google/WaxalNLP), preprocessed for fine-tuning text-to-speech models.
## What changed from the original?
The original Waxal recordings contain **click/pop artifacts** at the start and end of audio clips (likely from the recording equipment). These transients degrade TTS model quality during fine-tuning.
This dataset applies **Silero VAD (Voice Activity Detection)** to precisely detect speech boundaries and trim non-speech regions, removing the clicks while preserving all spoken content.
### Cleaning pipeline
| Step | Detail |
|------|--------|
| VAD model | [Silero VAD](https://github.com/snakers4/silero-vad) |
| Speech threshold | 0.3 |
| Padding | 30ms before/after detected speech |
| Fade in/out | 3ms (prevents new edge artifacts) |
| Min speech duration | 100ms |
| Min silence duration | 30ms |
## Dataset details
| Split | Samples |
|-------|---------|
| Train | 1,608 |
| Validation | 211 |
| Test | 205 |
| **Total** | **2,024** |
- **Language:** Luganda (lug)
- **Sample rate:** 48,000 Hz
- **Speakers:** 8 (4 male, 4 female)
- **License:** CC-BY-4.0 (inherited from WaxalNLP)
## Speaker distribution
| Speaker | Gender | Samples |
|---------|--------|---------|
| 1 | Male | 209 |
| 2 | Male | 214 |
| 3 | Male | 201 |
| 4 | Female | 192 |
| 5 | Female | 203 |
| 6 | Female | 192 |
| 7 | Male | 201 |
| 8 | Male | 196 |
## Usage
```python
from datasets import load_dataset
ds = load_dataset("Cal3bd3v/waxal-lug-clean")
# Access a sample
sample = ds["train"][0]
print(sample["text"]) # "Onaasaala e Juma ku Lwokutaano luno?"
print(sample["audio"]) # {'array': array([...]), 'sampling_rate': 48000}
```
### Fine-tuning MMS-TTS
```bash
python train_tts.py --dataset-name Cal3bd3v/waxal-lug-clean --text-column text
```
## Citation
If you use this dataset, please cite the original WaxalNLP paper:
```bibtex
@article{waxal2025,
title={WAXAL: A Large-Scale Multilingual African Language Speech Corpus},
author={Google Research},
journal={arXiv preprint arXiv:2602.02734},
year={2025}
}
```
## Acknowledgments
- **Original data:** [Google WaxalNLP](https://huggingface.co/datasets/google/WaxalNLP), collected by Makerere University and partners
- **Cleaning:** Silero VAD for speech boundary detection
提供机构:
CraneAILabs



