ClArTTS
收藏魔搭社区2025-11-07 更新2025-03-22 收录
下载链接:
https://modelscope.cn/datasets/MBZUAI/ClArTTS
下载链接
链接失效反馈官方服务:
资源简介:
## Dataset Summary
We present a speech corpus for Classical Arabic Text-to-Speech (ClArTTS) to support the development of end-to-end TTS systems for Arabic. The speech is extracted from a LibriVox audiobook, which is then processed, segmented, and manually transcribed and annotated. The final ClArTTS corpus contains about 12 hours of speech from a single male speaker sampled at 40100 kHz.
## Dataset Description
- **Homepage:** [ClArTTS](http://www.clartts.com/)
- **Paper:** [ClARTTS: An Open-Source Classical Arabic Text-to-Speech Corpus](https://www.isca-archive.org/interspeech_2023/kulkarni23_interspeech.pdf)
## Dataset Structure
A typical data point comprises the name of the audio file, called 'file', its transcription, called `text`, the audio as an array, called 'audio'. Some additional information; sampling rate and audio duration.
```
DatasetDict({
train: Dataset({
features: ['text', 'file', 'audio', 'sampling_rate', 'duration'],
num_rows: 9500
})
test: Dataset({
features: ['text', 'file', 'audio', 'sampling_rate', 'duration'],
num_rows: 205
})
})
```
### Citation Information
```
@inproceedings{kulkarni2023clartts,
author={Ajinkya Kulkarni and Atharva Kulkarni and Sara Abedalmon'em Mohammad Shatnawi and Hanan Aldarmaki},
title={ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus},
year={2023},
booktitle={2023 INTERSPEECH },
pages={5511--5515},
doi={10.21437/Interspeech.2023-2224}
}
```
## 数据集概述
我们发布了一款面向古典阿拉伯语文本转语音(Classical Arabic Text-to-Speech,ClArTTS)的语音语料库,旨在支撑阿拉伯语端到端文本转语音系统的研发工作。该语料库的语音素材取自LibriVox有声读物,随后经过处理、分段,并进行了人工转录与标注。最终的ClArTTS语料库包含单名男性说话者的约12小时语音,采样率为40100 kHz。
## 数据集说明
- **数据集主页:** [ClArTTS](http://www.clartts.com/)
- **相关论文:** [ClArTTS:一款开源古典阿拉伯语文本转语音语料库](https://www.isca-archive.org/interspeech_2023/kulkarni23_interspeech.pdf)
## 数据集结构
典型数据样本包含音频文件名(字段名为`file`)、对应转录文本(字段名为`text`)、以数组形式存储的音频数据(字段名为`audio`),以及采样率、音频时长等附加信息。
DatasetDict({
train: Dataset({
features: ['text', 'file', 'audio', 'sampling_rate', 'duration'],
num_rows: 9500
})
test: Dataset({
features: ['text', 'file', 'audio', 'sampling_rate', 'duration'],
num_rows: 205
})
})
## 引用信息
@inproceedings{kulkarni2023clartts,
author={Ajinkya Kulkarni and Atharva Kulkarni and Sara Abedalmon'em Mohammad Shatnawi and Hanan Aldarmaki},
title={ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus},
year={2023},
booktitle={2023 INTERSPEECH},
pages={5511--5515},
doi={10.21437/Interspeech.2023-2224}
}
提供机构:
maas
创建时间:
2025-03-17



