ghananlpcommunity/ghana-female-twi-8sec-splits
收藏Hugging Face2026-03-29 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/ghananlpcommunity/ghana-female-twi-8sec-splits
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- twi
license: cc-by-4.0
task_categories:
- automatic-speech-recognition
- text-to-speech
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
tags:
- speech
- twi
- ghana
- african-languages
- low-resource
- 8gram-splits
- ctc-aligned
- vad-trimmed
pretty_name: Twi 8-Word Speech Segments
---
# Twi 8-Word Speech Segments
25951 speech-text pairs split from 30-min recordings.
## Processing pipeline
1. Source audio from `ghananlpcommunity/ghana-female-twi-tts-full-length`
2. Full-file CTC forced alignment (MMS-300M) for word-level timestamps
3. Words grouped into 16-word (8-gram) segments
4. Leading/trailing silence trimmed with VAD (-40 dBFS threshold)
5. Filtered: min 1.0s, max 15.0s
6. Original sample rate preserved (24kHz)
## Usage
```python
from datasets import load_dataset
ds = load_dataset("ghananlpcommunity/ghana-female-twi-8sec-splits", split="train")
```
提供机构:
ghananlpcommunity



