Kallaama
收藏arXiv2024-04-02 更新2024-06-21 收录
下载链接:
https://github.com/gauthelo/kallaama-speech-dataset
下载链接
链接失效反馈官方服务:
资源简介:
Kallaama数据集是由Orange创新、Jokalante SARL和Thiès理工大学合作创建的,专注于塞内加尔三大主要语言:Wolof、Pulaar和Sereer的农业相关语音数据。该数据集包含125小时的录音,旨在支持自动语音识别技术的发展。数据集内容涵盖了农业领域的自然、自发语音,用于训练大型词汇语音识别模型。创建过程涉及多种录音类型,如互动广播节目、焦点小组、语音消息和访谈。Kallaama数据集的应用领域主要集中在通过语音技术提升农业信息的可访问性,特别是在塞内加尔的农村地区,旨在解决语言障碍和数字鸿沟问题。
The Kallaama dataset was collaboratively created by Orange Innovation, Jokalante SARL, and the University of Thiès, focusing on agricultural speech data in three major Senegalese languages: Wolof, Pulaar, and Sereer. It contains 125 hours of audio recordings, and is designed to support the development of automatic speech recognition (ASR) technologies. The dataset covers natural, spontaneous speech in the agricultural domain, intended for training large-vocabulary speech recognition models. Multiple recording types were involved in its creation, including interactive radio broadcasts, focus groups, voice messages, and interviews. The primary application scenarios of the Kallaama dataset center on improving the accessibility of agricultural information via speech technologies, especially in rural areas of Senegal, with the goal of addressing language barriers and the digital divide.
提供机构:
Orange创新,Jokalante SARL,Thiès理工大学
创建时间:
2024-04-02



