L2AraSpeech

Name: L2AraSpeech
Creator: Mohammed Algabri
License: 暂无描述

IEEE2026-04-17 收录

下载链接：

https://ieee-dataport.org/documents/l2araspeech

下载链接

链接失效反馈

官方服务：

资源简介：

The L2AraSpeech dataset is a novel speech corpus specifically designed to advance Computer-Aided Pronunciation Training (CAPT) for Arabic as a Second Language. Developed to address the scarcity of non-native Arabic speech databases, this corpus comprises audio recordings from 220 non-native speakers from diverse national and linguistic backgrounds. Among the 220 speakers, we selected 60 speakers covering almost all the L1-langaugaes for pronunciation errors annotation. The text of the recorded speech consists of 25 sentences and 61 minimal pairs. The text was chosen by expert linguists with many years of experience in teaching Arabic to non-Arabs.  The dataset is organized into three main components: Raw_Data, Processed_Data, and Annotated_Data, reflecting different stages of data preparation and expert annotation.

提供机构：

Mohammed Algabri

5,000+

优质数据集

54 个

任务类型

进入经典数据集