five

Unlabelled Arabic Speech Dataset

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://data.mendeley.com/datasets/yhy76b2c7b
下载链接
链接失效反馈
官方服务:
资源简介:
Speech is crucial for communication, especially in the era of artificial intelligence. Many fields, including security and forensics, rely on analyzing speech, where it can be used to identify people, but there's a challenge when it comes to Arabic language data. Most speech analysis models are primarily trained on English data and don't work well with Arabic. Therefore, a large dataset with over a million Arabic samples was built for training models and improving performance. The samples were collected from Arabic podcasts, so they're clear and noise-free. The audio recordings were extracted and segmented into one-second segments. Then, they were transformed into Mel-spectrograms, using a 22 kHz sampling rate, a frame size of 2048 samples, and a hop length of 512 samples. Researchers working with Arabic speech data can benefit from this dataset to try different machine learning and deep learning models. Using our dataset to improve Arabic speaker identification, we trained a Siamese speech embedding model. We tested its performance using a benchmark dataset, which can be viewed in the paper titled "Unsupervised Arabic Speech Embedding Model for Speaker Identification" https://ieeexplore-ieee-org.aus.idm.oclc.org/document/10191576.
创建时间:
2023-09-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作