five

L2AraSpeech

收藏
IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/l2araspeech
下载链接
链接失效反馈
官方服务:
资源简介:
The L2AraSpeech dataset is a novel speech corpus specifically designed to advance Computer-Aided Pronunciation Training (CAPT) for Arabic as a Second Language. Developed to address the scarcity of non-native Arabic speech databases, this corpus comprises audio recordings from 220 non-native speakers from diverse national and linguistic backgrounds. Among the 220 speakers, we selected 60 speakers covering almost all the L1-langaugaes for pronunciation errors annotation. The text of the recorded speech consists of 25 sentences and 61 minimal pairs. The text was chosen by expert linguists with many years of experience in teaching Arabic to non-Arabs.  The dataset is organized into three main components: Raw_Data, Processed_Data, and Annotated_Data, reflecting different stages of data preparation and expert annotation.
提供机构:
Mohammed Algabri
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作