five

ghananlpcommunity/navigation-corpus-twi-speech

收藏
Hugging Face2026-04-05 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/ghananlpcommunity/navigation-corpus-twi-speech
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - twi license: cc-by-4.0 task_categories: - automatic-speech-recognition - text-to-speech multilinguality: - monolingual size_categories: - 1K<n<10K tags: - speech - twi - ghana - african-languages - low-resource - sentence-splits - ctc-aligned - vad-trimmed pretty_name: Twi Sentence Speech Segments --- # Twi Speech Segments (sentence splitting) 52562 speech-text pairs split from long recordings. ## Processing pipeline 1. Source audio from `ghananlpcommunity/navigation-corpus-speech-full-twi` 2. Full-file CTC forced alignment (MMS-300M) for word-level timestamps 3. Sentence-boundary splits (. ? !) — long sentences re-chunked to 16 words 4. Leading/trailing silence trimmed with VAD (-40 dBFS threshold) 5. Filtered: min 1.0s, max 15.0s 6. Original sample rate preserved ## Usage ```python from datasets import load_dataset ds = load_dataset("ghananlpcommunity/navigation-corpus-twi-speech", split="train") ```
提供机构:
ghananlpcommunity
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作