five

ghananlpcommunity/new-twi-tts-aligned

收藏
Hugging Face2026-04-26 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/ghananlpcommunity/new-twi-tts-aligned
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* dataset_info: features: - name: audio dtype: audio: sampling_rate: 24000 - name: text dtype: string splits: - name: train num_bytes: 26963764996 num_examples: 145258 - name: test num_bytes: 2884516661 num_examples: 16140 download_size: 29817514008 dataset_size: 29848281657 --- # Twi TTS Dataset A speech dataset of Twi (Akan) extracted from Ghanaian news media broadcasts, designed for training and fine-tuning Text-To-Speech (TTS) models. ## 📂 Dataset Structure | Column | Type | Description | |--------|------|-------------| | `audio` | Audio | 24 kHz mono WAV audio segment | | `text` | string | Verbatim Twi transcription of the audio segment | | `duration` | float | Duration of the audio segment in seconds | ## 📊 Statistics | Metric | Value | |--------|-------| | Total clips | 161,398 | | Total duration | 172.44 hours | | Mean clip duration | 3.85 s | | Min / Max clip duration | 0.10 s / 29.82 s | | Mean words per clip | 11.5 | | Min / Max words | 1 / 81 | | Vocabulary size | 64,812 unique words | | Sample rate | 24,000 Hz (mono) | ## 🚀 Usage ```python from datasets import load_dataset dataset = load_dataset("ghananlpcommunity/new-twi-tts-aligned") train = dataset["train"] example = train[0] print("Transcription:", example["text"]) print("Duration (s):", example["duration"]) print("Audio array shape:", example["audio"]["array"].shape) print("Sample rate:", example["audio"]["sampling_rate"]) ``` ## 🎯 Intended Use Cases * Building TTS models from scratch or finetuning for Twi (Akan) * Linguistic research on Twi phonology and prosody * Low-resource African language ASR benchmarking ## 📜 Citation ```bibtex @dataset{twi_tts, author = {Owusu, Mich-Seth}, title = {Twi TTS Dataset}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/ghananlpcommunity/new-twi-tts-aligned} } ``` ## 🙏 Acknowledgments Created by Mich-Seth Owusu for the Ghana NLP Community.
提供机构:
ghananlpcommunity
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作