five

ghananlpcommunity/ghana-female-twi-speech-asr-8word-splits

收藏
Hugging Face2026-03-29 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/ghananlpcommunity/ghana-female-twi-speech-asr-8word-splits
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - twi license: cc-by-4.0 task_categories: - automatic-speech-recognition - text-to-speech multilinguality: - monolingual size_categories: - 1K<n<10K tags: - speech - twi - ghana - african-languages - low-resource - 8gram-splits - ctc-aligned - vad-trimmed pretty_name: Twi 8-Word Speech Segments --- # Twi 8-Word Speech Segments 51139 speech-text pairs split from 30-min recordings. ## Processing pipeline 1. Source audio from `ghananlpcommunity/ghana-female-twi-tts-full-length` 2. Full-file CTC forced alignment (MMS-300M) for word-level timestamps 3. Words grouped into 8-word (8-gram) segments 4. Leading/trailing silence trimmed with VAD (-40 dBFS threshold) 5. Filtered: min 1.0s, max 15.0s 6. Original sample rate preserved (24kHz) ## Usage ```python from datasets import load_dataset ds = load_dataset("ghananlpcommunity/ghana-female-twi-8sec-splits", split="train") ```
提供机构:
ghananlpcommunity
二维码
社区交流群
二维码
科研交流群
商业服务