ArVoice
收藏魔搭社区2025-12-18 更新2025-05-24 收录
下载链接:
https://modelscope.cn/datasets/MBZUAI/ArVoice
下载链接
链接失效反馈官方服务:
资源简介:
<h2 align="center">
<b>ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis</b>
</h2>
<p align="center"> Hawau Olamide Toyin, Rufael Marew, Humaid Alblooshi, Samar M. Magdy, Hanan Aldarmaki </p>
<p align="center"> {hawau.toyin, hanan.aldarmaki}@mbzuai.ac.ae </p>
<div style="font-size: 16px; text-align: justify;">
<p>ArVoice is a multi-speaker Modern Standard Arabic (MSA) speech corpus with fully diacritized transcriptions, intended for multi-speaker speech synthesis, and can be useful for other tasks such as speech-based diacritic restoration, voice conversion, and deepfake detection. <br>
ArVoice comprises: (1) professionally recorded audio by 2 male and 2 female voice artists from diacritized transcripts, (2) professionally recorded audio by 1 male and 1 female voice artists from undiacritized transcripts, (3) a modified subset of the
Arabic Speech Corpus, and (4) synthesized speech using commercial TTS systems. The complete corpus consists of a total of 83.52 hours of speech across 11 voices; around 10 hours consist of human voices from 7 speakers. <br> <br>
<strong> This repo consists of only Parts (3), ASC subset, and (4) synthetic subset </strong>; to access the main subset, part (1,2), which consists of six professional speakers, <a href="https://huggingface.co/datasets/MBZUAI/ArVoice/resolve/main/ArVoice%20DUA.pdf"> please sign this agreement</a> and email it to us.
<br><br> If you use the dataset or transcriptions provided in Huggingface, <u>place cite the paper</u>.
</p>
</div>
Usage Example
```python
df = load_dataset("MBZUAI/ArVoice", "Human_3") #data_dir options: Human_3, Synthetic,
print(df)
DatasetDict({
train: Dataset({
features: ['original_wav', 'normalized_wav', 'speaker_id', 'transcription'],
num_rows: 907
})
test: Dataset({
features: ['original_wav', 'normalized_wav', 'speaker_id', 'transcription'],
num_rows: 100
})
})
```
Data Statistics
| Type | Part | Gender | Speaker Origin | Duration (hrs) | Text Source |
|-----------|-----------------|------------|----------------|----------------|------------------------------|
| Human | ArVoice Part 1 | M | Egypt | 1.17 | Tashkeela |
| | | F | Jordan | 1.45 | |
| | | M | Egypt | 1.58 | |
| | | F | Morocco | 1.23 | |
| | ArVoice Part 2 | M | Palestine | 0.93 | Khaleej |
| | | F | Egypt | 0.93 | |
| | ArVoice Part 3 | M | Syria | 2.69 | ASC |
| Synthetic | ArVoice Part 4 | 2×M, 2×F | - | 73.5 | Tashkeela, Khaleej, ASC |
License: [https://creativecommons.org/licenses/by/4.0/](https://creativecommons.org/licenses/by/4.0/)
### Citation
```
@inproceedings{toyin25_interspeech,
title = {{ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis}},
author = {Hawau Toyin and Rufael Marew and Humaid Alblooshi and Samar M. Magdy and Hanan Aldarmaki},
year = {2025},
booktitle = {{Interspeech 2025}},
pages = {4808--4812},
doi = {10.21437/Interspeech.2025-1550},
issn = {2958-1796},
}
```
<h2 align="center"><b>ArVoice: 面向阿拉伯语语音合成的多说话人数据集</b></h2>
<p align="center">Hawau Olamide Toyin, Rufael Marew, Humaid Alblooshi, Samar M. Magdy, Hanan Aldarmaki</p>
<p align="center">{hawau.toyin, hanan.aldarmaki}@mbzuai.ac.ae</p>
<div style="font-size: 16px; text-align: justify;">
<p>ArVoice是一个多说话人现代标准阿拉伯语(Modern Standard Arabic,MSA)语音语料库,带有完全带变音符号的转写文本,旨在用于多说话人语音合成,同时也可应用于基于语音的变音符号恢复、语音转换以及深度伪造检测等其他任务。<br>
ArVoice包含以下四部分:(1) 由2名男性和2名女性配音艺术家基于带变音符号的转写文本进行专业录制的音频;(2) 由1名男性和1名女性配音艺术家基于无变音符号的转写文本进行专业录制的音频;(3) 阿拉伯语语音语料库(Arabic Speech Corpus,ASC)的修改子集;(4) 使用商业文本到语音(Text-to-Speech,TTS)系统合成的语音。完整语料库总计包含11位说话人的83.52小时语音数据,其中约10小时为7位说话人的人声录制数据。<br> <br>
<strong>本仓库仅包含第(3)部分(阿拉伯语语音语料库子集)和第(4)部分的合成语音子集</strong>;若需获取包含6位专业配音艺术家的核心子集(即第1、2部分),请签署<a href="https://huggingface.co/datasets/MBZUAI/ArVoice/resolve/main/ArVoice%20DUA.pdf">本协议</a>并发送至我们的邮箱。
<br><br>若您在Hugging Face平台使用本数据集或其转写文本,请引用本论文。
</p>
</div>
使用示例
python
df = load_dataset("MBZUAI/ArVoice", "Human_3") #数据目录可选参数:Human_3、Synthetic
print(df)
数据集字典({
训练集: Dataset({
特征项: ['original_wav', 'normalized_wav', 'speaker_id', 'transcription'],
样本数: 907
})
测试集: Dataset({
特征项: ['original_wav', 'normalized_wav', 'speaker_id', 'transcription'],
样本数: 100
})
})
数据统计
| 数据类型 | 数据集部分 | 性别 | 说话人来源 | 时长(小时) | 文本来源 |
|-----------|-----------------|------------|----------------|----------------|------------------------------|
| 人声 | ArVoice 第1部分 | 男 | 埃及 | 1.17 | Tashkeela |
| | | 女 | 约旦 | 1.45 | |
| | | 男 | 埃及 | 1.58 | |
| | | 女 | 摩洛哥 | 1.23 | |
| | ArVoice 第2部分 | 男 | 巴勒斯坦 | 0.93 | Khaleej |
| | | 女 | 埃及 | 0.93 | |
| | ArVoice 第3部分 | 男 | 叙利亚 | 2.69 | ASC |
| 合成语音 | ArVoice 第4部分 | 2男、2女 | - | 73.5 | Tashkeela、Khaleej、ASC |
许可证:<a href="https://creativecommons.org/licenses/by/4.0/">知识共享署名4.0国际许可协议</a>
### 引用
@inproceedings{toyin25_interspeech,
title = {{ArVoice: 面向阿拉伯语语音合成的多说话人数据集}},
author = {Hawau Toyin and Rufael Marew and Humaid Alblooshi and Samar M. Magdy and Hanan Aldarmaki},
year = {2025},
booktitle = {{Interspeech 2025}},
pages = {4808--4812},
doi = {10.21437/Interspeech.2025-1550},
issn = {2958-1796},
}
提供机构:
maas
创建时间:
2025-05-21



