下载链接：

https://modelscope.cn/datasets/MBZUAI/ArVoice

下载链接

链接失效反馈

官方服务：

资源简介：

<h2 align="center"> <b>ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis</b> </h2> <p align="center"> Hawau Olamide Toyin, Rufael Marew, Humaid Alblooshi, Samar M. Magdy, Hanan Aldarmaki </p> <p align="center"> {hawau.toyin, hanan.aldarmaki}@mbzuai.ac.ae </p> <div style="font-size: 16px; text-align: justify;"> <p>ArVoice is a multi-speaker Modern Standard Arabic (MSA) speech corpus with fully diacritized transcriptions, intended for multi-speaker speech synthesis, and can be useful for other tasks such as speech-based diacritic restoration, voice conversion, and deepfake detection. <br> ArVoice comprises: (1) professionally recorded audio by 2 male and 2 female voice artists from diacritized transcripts, (2) professionally recorded audio by 1 male and 1 female voice artists from undiacritized transcripts, (3) a modified subset of the Arabic Speech Corpus, and (4) synthesized speech using commercial TTS systems. The complete corpus consists of a total of 83.52 hours of speech across 11 voices; around 10 hours consist of human voices from 7 speakers. <br> <br> <strong> This repo consists of only Parts (3), ASC subset, and (4) synthetic subset </strong>; to access the main subset, part (1,2), which consists of six professional speakers, <a href="https://huggingface.co/datasets/MBZUAI/ArVoice/resolve/main/ArVoice%20DUA.pdf"> please sign this agreement</a> and email it to us. <br><br> If you use the dataset or transcriptions provided in Huggingface, <u>place cite the paper</u>. </p> </div> Usage Example ```python df = load_dataset("MBZUAI/ArVoice", "Human_3") #data_dir options: Human_3, Synthetic, print(df) DatasetDict({ train: Dataset({ features: ['original_wav', 'normalized_wav', 'speaker_id', 'transcription'], num_rows: 907 }) test: Dataset({ features: ['original_wav', 'normalized_wav', 'speaker_id', 'transcription'], num_rows: 100 }) }) ``` Data Statistics | Type | Part | Gender | Speaker Origin | Duration (hrs) | Text Source | |-----------|-----------------|------------|----------------|----------------|------------------------------| | Human | ArVoice Part 1 | M | Egypt | 1.17 | Tashkeela | | | | F | Jordan | 1.45 | | | | | M | Egypt | 1.58 | | | | | F | Morocco | 1.23 | | | | ArVoice Part 2 | M | Palestine | 0.93 | Khaleej | | | | F | Egypt | 0.93 | | | | ArVoice Part 3 | M | Syria | 2.69 | ASC | | Synthetic | ArVoice Part 4 | 2×M, 2×F | - | 73.5 | Tashkeela, Khaleej, ASC | License: [https://creativecommons.org/licenses/by/4.0/](https://creativecommons.org/licenses/by/4.0/) ### Citation ``` @inproceedings{toyin25_interspeech, title = {{ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis}}, author = {Hawau Toyin and Rufael Marew and Humaid Alblooshi and Samar M. Magdy and Hanan Aldarmaki}, year = {2025}, booktitle = {{Interspeech 2025}}, pages = {4808--4812}, doi = {10.21437/Interspeech.2025-1550}, issn = {2958-1796}, } ```

<h2 align="center"><b>ArVoice: 面向阿拉伯语语音合成的多说话人数据集</b></h2> <p align="center">Hawau Olamide Toyin, Rufael Marew, Humaid Alblooshi, Samar M. Magdy, Hanan Aldarmaki</p> <p align="center">{hawau.toyin, hanan.aldarmaki}@mbzuai.ac.ae</p> <div style="font-size: 16px; text-align: justify;"> <p>ArVoice是一个多说话人现代标准阿拉伯语（Modern Standard Arabic，MSA）语音语料库，带有完全带变音符号的转写文本，旨在用于多说话人语音合成，同时也可应用于基于语音的变音符号恢复、语音转换以及深度伪造检测等其他任务。<br> ArVoice包含以下四部分：(1) 由2名男性和2名女性配音艺术家基于带变音符号的转写文本进行专业录制的音频；(2) 由1名男性和1名女性配音艺术家基于无变音符号的转写文本进行专业录制的音频；(3) 阿拉伯语语音语料库（Arabic Speech Corpus，ASC）的修改子集；(4) 使用商业文本到语音（Text-to-Speech，TTS）系统合成的语音。完整语料库总计包含11位说话人的83.52小时语音数据，其中约10小时为7位说话人的人声录制数据。<br> <br> <strong>本仓库仅包含第(3)部分（阿拉伯语语音语料库子集）和第(4)部分的合成语音子集</strong>；若需获取包含6位专业配音艺术家的核心子集（即第1、2部分），请签署<a href="https://huggingface.co/datasets/MBZUAI/ArVoice/resolve/main/ArVoice%20DUA.pdf">本协议</a>并发送至我们的邮箱。 <br><br>若您在Hugging Face平台使用本数据集或其转写文本，请引用本论文。 </p> </div> 使用示例 python df = load_dataset("MBZUAI/ArVoice", "Human_3") #数据目录可选参数：Human_3、Synthetic print(df) 数据集字典({ 训练集: Dataset({ 特征项: ['original_wav', 'normalized_wav', 'speaker_id', 'transcription'], 样本数: 907 }) 测试集: Dataset({ 特征项: ['original_wav', 'normalized_wav', 'speaker_id', 'transcription'], 样本数: 100 }) }) 数据统计 | 数据类型 | 数据集部分 | 性别 | 说话人来源 | 时长（小时） | 文本来源 | |-----------|-----------------|------------|----------------|----------------|------------------------------| | 人声 | ArVoice 第1部分 | 男 | 埃及 | 1.17 | Tashkeela | | | | 女 | 约旦 | 1.45 | | | | | 男 | 埃及 | 1.58 | | | | | 女 | 摩洛哥 | 1.23 | | | | ArVoice 第2部分 | 男 | 巴勒斯坦 | 0.93 | Khaleej | | | | 女 | 埃及 | 0.93 | | | | ArVoice 第3部分 | 男 | 叙利亚 | 2.69 | ASC | | 合成语音 | ArVoice 第4部分 | 2男、2女 | - | 73.5 | Tashkeela、Khaleej、ASC | 许可证：<a href="https://creativecommons.org/licenses/by/4.0/">知识共享署名4.0国际许可协议</a> ### 引用 @inproceedings{toyin25_interspeech, title = {{ArVoice: 面向阿拉伯语语音合成的多说话人数据集}}, author = {Hawau Toyin and Rufael Marew and Humaid Alblooshi and Samar M. Magdy and Hanan Aldarmaki}, year = {2025}, booktitle = {{Interspeech 2025}}, pages = {4808--4812}, doi = {10.21437/Interspeech.2025-1550}, issn = {2958-1796}, }

应用场景：