ghananlpcommunity/navigation-corpus-twi-speech
收藏Hugging Face2026-04-05 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/ghananlpcommunity/navigation-corpus-twi-speech
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- twi
license: cc-by-4.0
task_categories:
- automatic-speech-recognition
- text-to-speech
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
tags:
- speech
- twi
- ghana
- african-languages
- low-resource
- sentence-splits
- ctc-aligned
- vad-trimmed
pretty_name: Twi Sentence Speech Segments
---
# Twi Speech Segments (sentence splitting)
52562 speech-text pairs split from long recordings.
## Processing pipeline
1. Source audio from `ghananlpcommunity/navigation-corpus-speech-full-twi`
2. Full-file CTC forced alignment (MMS-300M) for word-level timestamps
3. Sentence-boundary splits (. ? !) — long sentences re-chunked to 16 words
4. Leading/trailing silence trimmed with VAD (-40 dBFS threshold)
5. Filtered: min 1.0s, max 15.0s
6. Original sample rate preserved
## Usage
```python
from datasets import load_dataset
ds = load_dataset("ghananlpcommunity/navigation-corpus-twi-speech", split="train")
```
提供机构:
ghananlpcommunity



