MadLook/arabic-whisper-multidialect
收藏Hugging Face2025-11-13 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/MadLook/arabic-whisper-multidialect
下载链接
链接失效反馈官方服务:
资源简介:
Arabic Whisper多方言自动语音识别数据集是一个全面的阿拉伯语多方言语音识别数据集,专为Whisper模型微调而准备。该数据集包含了来自多个方言的高质量阿拉伯语音数据,包括现代标准阿拉伯语、埃及阿拉伯语和海湾阿拉伯语。数据集共有约105,000个音频-文本对,分为训练集、验证集和测试集,每个集合分别占总数据的90%、5%和5%。音频格式为16kHz单声道,文本为标准化的阿拉伯文字。
The Arabic Whisper Multi-Dialect ASR Dataset is a comprehensive multi-dialect Arabic speech recognition dataset prepared for Whisper model fine-tuning. The dataset includes high-quality Arabic speech data from multiple dialects, such as Modern Standard Arabic, Egyptian Arabic, and Gulf Arabic. It consists of approximately 105,000 audio-text pairs, split into a training set, validation set, and test set, each accounting for 90%, 5%, and 5% of the total data, respectively. The audio format is 16kHz mono, and the text is in normalized Arabic script.
提供机构:
MadLook



