CUAIStudents/Ar-ASR
收藏Hugging Face2025-05-09 更新2025-10-18 收录
下载链接:
https://hf-mirror.com/datasets/CUAIStudents/Ar-ASR
下载链接
链接失效反馈官方服务:
资源简介:
Ar-ASR数据集是一个用于自动语音识别(ASR)的阿拉伯语音数据集,包含精确的转录,包括塔什基尔(音标)。该数据集包含来自多个来源的33,607个音频样本,包括Microsoft Edge TTS API、Common Voice(验证的阿拉伯子集)、个人贡献以及手动转录的YouTube视频。数据集与对齐的阿拉伯文本转录配对,旨在用于训练和评估ASR模型,例如OpenAI的Whisper,重点是准确识别阿拉伯发音和音标。
The Ar-ASR dataset is designed for Automatic Speech Recognition (ASR) focusing on Arabic speech with precise transcriptions including tashkeel (diacritics). It contains 33,607 audio samples from multiple sources such as Microsoft Edge TTS API, Common Voice (validated Arabic subset), individual contributions, and manually transcribed YouTube videos. The dataset is paired with aligned Arabic text transcriptions and is intended for training and evaluating ASR models like OpenAIs Whisper, with an emphasis on accurate recognition of Arabic pronunciation and diacritics.
提供机构:
CUAIStudents



