five

CLEAR-Global/Kenyan-Swahili-Speech

收藏
Hugging Face2026-04-24 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/CLEAR-Global/Kenyan-Swahili-Speech
下载链接
链接失效反馈
官方服务:
资源简介:
这是一个肯尼亚斯瓦希里语的单人朗读语音数据集,包含约6小时的提示录音,来自一位匿名男性说话者。数据集是CLEAR Global的Gamayun语言数据包计划的一部分,旨在为人道主义背景下使用的资源不足语言开发开源语言资源。 - **语言**: 斯瓦希里语(肯尼亚变体) - **说话者**: 1位匿名肯尼亚男性说话者 - **录音类型**: 提示朗读语音(说话者从脚本中朗读句子) - **总时长**: ~6小时(21,852秒) - **话语**: 4,700条 - **音频格式**: WAV,预分割 - **转录**: 来自Tatoeba存储库的句子斯瓦希里语翻译 句子集与CLEAR Global的Gamayun斯瓦希里语-英语平行文本包共享。英语源句子是从Tatoeba使用基于频率的算法选择的,斯瓦希里语翻译由CLEAR Global翻译社区制作。

A single-speaker read speech dataset in Kenyan Swahili, containing approximately 6 hours of prompted recordings from an anonymous male speaker. The dataset was produced as part of CLEAR Globals Gamayun Language Data Kits initiative, which develops open-source language resources for under-resourced languages used in humanitarian contexts. - **Language**: Swahili (Kenyan variety) - **Speaker**: 1 anonymous male Kenyan speaker - **Recording type**: Prompted read speech (speaker read sentences aloud from a script) - **Total duration**: ~6 hours (21,852 seconds) - **Utterances**: 4,700 - **Audio format**: WAV, pre-segmented - **Transcriptions**: Swahili translations of sentences from the Tatoeba repository The sentence set is shared with CLEAR Globals Gamayun Swahili–English parallel text kit. English source sentences were selected from Tatoeba using a frequency-based algorithm; Swahili translations were produced by the CLEAR Global translator community.
提供机构:
CLEAR-Global
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作