马凯雷雷广播语音语料库
收藏arXiv2022-06-20 更新2024-06-21 收录
下载链接:
https://doi.org/10.5281/zenodo.5855017
下载链接
链接失效反馈官方服务:
资源简介:
马凯雷雷广播语音语料库是由马凯雷雷大学人工智能研究实验室创建的,旨在支持自动语音识别(ASR)系统的开发,特别是针对资源匮乏语言的研究。该数据集包含155小时的Luganda广播语音数据,涵盖了背景噪音、电话语音、演播室语音、新闻报道和广告等多种广播场景。数据集的创建过程涉及从在线Luganda广播电台收集数据,并通过严格的转录规则进行转录。该数据集的应用领域包括广播监控、语言识别技术的发展以及支持国家发展规划中的信息提取。
The Makerere Broadcast Speech Corpus was developed by the Artificial Intelligence Research Laboratory at Makerere University, with the objective of supporting the development of automatic speech recognition (ASR) systems, especially research targeting low-resource languages. This corpus includes 155 hours of Luganda broadcast speech data, covering diverse broadcast scenarios such as background noise, telephone speech, studio speech, news reports and advertisements. The construction process of this corpus involved collecting data from online Luganda radio stations and carrying out transcriptions in compliance with strict transcription guidelines. The application areas of this corpus cover broadcast monitoring, the advancement of speech recognition technologies, and supporting information extraction in national development planning.
提供机构:
马凯雷雷大学
创建时间:
2022-06-20



