ghananlpcommunity/twi-speech-sota-240hrs
收藏Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/ghananlpcommunity/twi-speech-sota-240hrs
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- tw
- ak
license: cc-by-4.0
tags:
- audio
- speech
- tts
- twi
- akan
- ghanaian-languages
task_categories:
- automatic-speech-recognition
- text-to-speech
pretty_name: Twi TTS Dataset
size_categories:
- 100K<n<1M
---
# 🇬🇭 Twi ASR Dataset
A speech dataset of **Twi (Akan)** extracted from Ghanaian news media broadcasts,
designed for training and fine-tuning **Text-To-Speech (TTS)** models.
---
## 📂 Dataset Structure
| Column | Type | Description |
|-----------------|--------|--------------------------------------------------|
| `audio` | Audio | 24 kHz mono WAV audio segment |
| `text` | string | Verbatim Twi transcription of the audio segment |
| `duration` | float | Duration of the audio segment in seconds |
---
## 📊 Statistics
| Metric | Value |
|-------------------------|----------------------------------|
| Total clips | 132,212 |
| Total duration | **237.71 hours** |
| Mean clip duration | 6.47 s |
| Min / Max clip duration | 1.01 s / 15.0 s |
| Mean words per clip | 16.0 |
| Min / Max words | 1 / 16 |
| Vocabulary size | 42,970 unique words |
| Sample rate | 24,000 Hz (mono) |
---
## 🚀 Usage
```python
from datasets import load_dataset
dataset = load_dataset("ghananlpcommunity/twi-speech-sota-240hrs")
train = dataset["train"]
example = train[0]
print("Transcription:", example["text"])
print("Duration (s):", example["duration"])
print("Audio array shape:", example["audio"]["array"].shape)
print("Sample rate:", example["audio"]["sampling_rate"])
```
---
## 🎯 Intended Use Cases
- Building TTS models from scratch or finetuning for **Twi (Akan)**
- Linguistic research on Twi phonology and prosody
- Low-resource African language ASR benchmarking
---
## 📜 Citation
```bibtex
@dataset{twi_asr,
author = {Owusu, Mich-Seth},
title = {Twi ASR Dataset},
year = {2026},
publisher = {Hugging Face},
url = {[https://huggingface.co/datasets/](https://huggingface.co/datasets/)ghananlpcommunity/twi-speech-sota-200hrs}
}
```
---
## 🙏 Acknowledgments
Created by **Mich-Seth Owusu** for the **Ghana NLP Community**.
提供机构:
ghananlpcommunity



