BoburAmirov/it_youtube_uzbek_speech_dataset
收藏Hugging Face2025-12-11 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/BoburAmirov/it_youtube_uzbek_speech_dataset
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含乌兹别克语和部分英语的音频片段及其对应的转录文本,以提高泛化能力。数据来源于YouTube上公开的信息技术(IT)领域相关视频,主要用于训练和评估自动语音识别(ASR)模型。大部分内容来自Mohir Dev YouTube频道(感谢该团队在乌兹别克斯坦推动AI发展)。数据使用Gemini 2.5 Pro进行转录,并经过智能过滤。音频片段经过分割和格式化,便于现代深度学习框架使用。
This dataset contains audio clips and their corresponding transcriptions in the Uzbek language and with some english to better generalization. The data was collected from publicly available videos on YouTube related to the Information Technology (IT) field. It is designed for training and evaluating Automatic Speech Recognition (ASR) models. Most of the content comes from the Mohir Dev YouTube channel (respect to the team for advancing AI in Uzbekistan). The data was transcribed using Gemini 2.5 Pro and was intelligently filtered. The audio clips are segmented and formatted for easy use with modern deep learning frameworks.
提供机构:
BoburAmirov



