five

ghananlpcommunity/ghana-female-twi-8sec-splits

收藏
Hugging Face2026-03-29 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/ghananlpcommunity/ghana-female-twi-8sec-splits
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - twi license: cc-by-4.0 task_categories: - automatic-speech-recognition - text-to-speech multilinguality: - monolingual size_categories: - 1K<n<10K tags: - speech - twi - ghana - african-languages - low-resource - 8gram-splits - ctc-aligned - vad-trimmed pretty_name: Twi 8-Word Speech Segments --- # Twi 8-Word Speech Segments 25951 speech-text pairs split from 30-min recordings. ## Processing pipeline 1. Source audio from `ghananlpcommunity/ghana-female-twi-tts-full-length` 2. Full-file CTC forced alignment (MMS-300M) for word-level timestamps 3. Words grouped into 16-word (8-gram) segments 4. Leading/trailing silence trimmed with VAD (-40 dBFS threshold) 5. Filtered: min 1.0s, max 15.0s 6. Original sample rate preserved (24kHz) ## Usage ```python from datasets import load_dataset ds = load_dataset("ghananlpcommunity/ghana-female-twi-8sec-splits", split="train") ```
提供机构:
ghananlpcommunity
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作