BoburAmirov/news_youtube_uzbek_speech_dataset
收藏Hugging Face2025-12-11 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/BoburAmirov/news_youtube_uzbek_speech_dataset
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含乌兹别克语不同方言的音频片段及其对应的转录文本。数据来源于YouTube上公开的新闻视频,旨在用于训练和评估自动语音识别(ASR)模型。大部分内容来自Kunuz和Qalampir YouTube频道,转录使用了Gemini 2.5 Pro,并经过智能过滤。音频片段经过分段和格式化,便于现代深度学习框架使用。
This dataset contains audio clips and their corresponding transcriptions in the Uzbek language with different dialects. The data was collected from publicly available news videos on YouTube. It is designed for training and evaluating Automatic Speech Recognition (ASR) models. Most of the content comes from the Kunuz, Qalampir YouTube channels. The data was transcribed using Gemini 2.5 Pro and was intelligently filtered. The audio clips are segmented and formatted for easy use with modern deep learning frameworks.
提供机构:
BoburAmirov



