nicsaurabhsharma/punjabi-asr
收藏Hugging Face2025-12-15 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/nicsaurabhsharma/punjabi-asr
下载链接
链接失效反馈官方服务:
资源简介:
Shrutilipi是一个标记的ASR语料库,通过从全印广播电台新闻公告中挖掘并行音频和文本对获得,涵盖12种印度语言,包括旁遮普语。该数据集包含音频、转录文本和英语翻译三个特征,总数据量超过6400小时,旁遮普语部分包含39,238个训练样本,大小约为10.9GB。数据集的创建旨在改善低资源语言的ASR系统性能。
Shrutilipi is a labelled ASR corpus obtained by mining parallel audio and text pairs at the document scale from All India Radio news bulletins for 12 Indian languages: Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Sanskrit, Tamil, Telugu, Urdu. The corpus has over 6400 hours of data across all languages. The Punjabi portion includes 39,238 training examples and is approximately 10.9GB in size. The dataset was created to improve ASR systems for low-resource languages.
提供机构:
nicsaurabhsharma



