ghananlpcommunity/new-twi-tts-aligned
收藏Hugging Face2026-04-26 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/ghananlpcommunity/new-twi-tts-aligned
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: test
path: data/test-*
dataset_info:
features:
- name: audio
dtype:
audio:
sampling_rate: 24000
- name: text
dtype: string
splits:
- name: train
num_bytes: 26963764996
num_examples: 145258
- name: test
num_bytes: 2884516661
num_examples: 16140
download_size: 29817514008
dataset_size: 29848281657
---
# Twi TTS Dataset
A speech dataset of Twi (Akan) extracted from Ghanaian news media broadcasts, designed for training and fine-tuning Text-To-Speech (TTS) models.
## 📂 Dataset Structure
| Column | Type | Description |
|--------|------|-------------|
| `audio` | Audio | 24 kHz mono WAV audio segment |
| `text` | string | Verbatim Twi transcription of the audio segment |
| `duration` | float | Duration of the audio segment in seconds |
## 📊 Statistics
| Metric | Value |
|--------|-------|
| Total clips | 161,398 |
| Total duration | 172.44 hours |
| Mean clip duration | 3.85 s |
| Min / Max clip duration | 0.10 s / 29.82 s |
| Mean words per clip | 11.5 |
| Min / Max words | 1 / 81 |
| Vocabulary size | 64,812 unique words |
| Sample rate | 24,000 Hz (mono) |
## 🚀 Usage
```python
from datasets import load_dataset
dataset = load_dataset("ghananlpcommunity/new-twi-tts-aligned")
train = dataset["train"]
example = train[0]
print("Transcription:", example["text"])
print("Duration (s):", example["duration"])
print("Audio array shape:", example["audio"]["array"].shape)
print("Sample rate:", example["audio"]["sampling_rate"])
```
## 🎯 Intended Use Cases
* Building TTS models from scratch or finetuning for Twi (Akan)
* Linguistic research on Twi phonology and prosody
* Low-resource African language ASR benchmarking
## 📜 Citation
```bibtex
@dataset{twi_tts,
author = {Owusu, Mich-Seth},
title = {Twi TTS Dataset},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/ghananlpcommunity/new-twi-tts-aligned}
}
```
## 🙏 Acknowledgments
Created by Mich-Seth Owusu for the Ghana NLP Community.
提供机构:
ghananlpcommunity



