five

BoburAmirov/podcasts_tashkent_dialect_youtube_uzbek_speech_dataset

收藏
Hugging Face2025-12-11 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/BoburAmirov/podcasts_tashkent_dialect_youtube_uzbek_speech_dataset
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含乌兹别克语(主要是塔什干方言)的音频片段及其对应的转录文本。数据来源于YouTube上的公开播客视频,主要用于自动语音识别(ASR)模型的训练和评估。大部分内容来自Jahongir Latipov的采访和Bu播客(尊重作者)的YouTube视频。数据使用Gemini 2.5 Pro进行转录,并经过智能过滤。音频片段经过分段和格式化,便于现代深度学习框架使用。

This dataset contains audio clips and their corresponding transcriptions in the Uzbek language with mostly tashkent dialects. The data was collected from publicly available podcast videos on YouTube. It is designed for training and evaluating Automatic Speech Recognition (ASR) models. Most of the content comes from the Jahongir Latipov interviews and Bu podcast (respect authors) YouTube videos. The data was transcribed using Gemini 2.5 Pro and was intelligently filtered. The audio clips are segmented and formatted for easy use with modern deep learning frameworks.
提供机构:
BoburAmirov
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作