CLEAR-Global/Kenyan-Swahili-Speech
收藏Hugging Face2026-04-24 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/CLEAR-Global/Kenyan-Swahili-Speech
下载链接
链接失效反馈官方服务:
资源简介:
这是一个肯尼亚斯瓦希里语的单人朗读语音数据集,包含约6小时的提示录音,来自一位匿名男性说话者。数据集是CLEAR Global的Gamayun语言数据包计划的一部分,旨在为人道主义背景下使用的资源不足语言开发开源语言资源。
- **语言**: 斯瓦希里语(肯尼亚变体)
- **说话者**: 1位匿名肯尼亚男性说话者
- **录音类型**: 提示朗读语音(说话者从脚本中朗读句子)
- **总时长**: ~6小时(21,852秒)
- **话语**: 4,700条
- **音频格式**: WAV,预分割
- **转录**: 来自Tatoeba存储库的句子斯瓦希里语翻译
句子集与CLEAR Global的Gamayun斯瓦希里语-英语平行文本包共享。英语源句子是从Tatoeba使用基于频率的算法选择的,斯瓦希里语翻译由CLEAR Global翻译社区制作。
A single-speaker read speech dataset in Kenyan Swahili, containing approximately 6 hours of prompted recordings from an anonymous male speaker. The dataset was produced as part of CLEAR Globals Gamayun Language Data Kits initiative, which develops open-source language resources for under-resourced languages used in humanitarian contexts.
- **Language**: Swahili (Kenyan variety)
- **Speaker**: 1 anonymous male Kenyan speaker
- **Recording type**: Prompted read speech (speaker read sentences aloud from a script)
- **Total duration**: ~6 hours (21,852 seconds)
- **Utterances**: 4,700
- **Audio format**: WAV, pre-segmented
- **Transcriptions**: Swahili translations of sentences from the Tatoeba repository
The sentence set is shared with CLEAR Globals Gamayun Swahili–English parallel text kit. English source sentences were selected from Tatoeba using a frequency-based algorithm; Swahili translations were produced by the CLEAR Global translator community.
提供机构:
CLEAR-Global



