five

ClArTTS

收藏
魔搭社区2025-11-07 更新2025-03-22 收录
下载链接:
https://modelscope.cn/datasets/MBZUAI/ClArTTS
下载链接
链接失效反馈
官方服务:
资源简介:
## Dataset Summary We present a speech corpus for Classical Arabic Text-to-Speech (ClArTTS) to support the development of end-to-end TTS systems for Arabic. The speech is extracted from a LibriVox audiobook, which is then processed, segmented, and manually transcribed and annotated. The final ClArTTS corpus contains about 12 hours of speech from a single male speaker sampled at 40100 kHz. ## Dataset Description - **Homepage:** [ClArTTS](http://www.clartts.com/) - **Paper:** [ClARTTS: An Open-Source Classical Arabic Text-to-Speech Corpus](https://www.isca-archive.org/interspeech_2023/kulkarni23_interspeech.pdf) ## Dataset Structure A typical data point comprises the name of the audio file, called 'file', its transcription, called `text`, the audio as an array, called 'audio'. Some additional information; sampling rate and audio duration. ``` DatasetDict({ train: Dataset({ features: ['text', 'file', 'audio', 'sampling_rate', 'duration'], num_rows: 9500 }) test: Dataset({ features: ['text', 'file', 'audio', 'sampling_rate', 'duration'], num_rows: 205 }) }) ``` ### Citation Information ``` @inproceedings{kulkarni2023clartts, author={Ajinkya Kulkarni and Atharva Kulkarni and Sara Abedalmon'em Mohammad Shatnawi and Hanan Aldarmaki}, title={ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus}, year={2023}, booktitle={2023 INTERSPEECH }, pages={5511--5515}, doi={10.21437/Interspeech.2023-2224} } ```

## 数据集概述 我们发布了一款面向古典阿拉伯语文本转语音(Classical Arabic Text-to-Speech,ClArTTS)的语音语料库,旨在支撑阿拉伯语端到端文本转语音系统的研发工作。该语料库的语音素材取自LibriVox有声读物,随后经过处理、分段,并进行了人工转录与标注。最终的ClArTTS语料库包含单名男性说话者的约12小时语音,采样率为40100 kHz。 ## 数据集说明 - **数据集主页:** [ClArTTS](http://www.clartts.com/) - **相关论文:** [ClArTTS:一款开源古典阿拉伯语文本转语音语料库](https://www.isca-archive.org/interspeech_2023/kulkarni23_interspeech.pdf) ## 数据集结构 典型数据样本包含音频文件名(字段名为`file`)、对应转录文本(字段名为`text`)、以数组形式存储的音频数据(字段名为`audio`),以及采样率、音频时长等附加信息。 DatasetDict({ train: Dataset({ features: ['text', 'file', 'audio', 'sampling_rate', 'duration'], num_rows: 9500 }) test: Dataset({ features: ['text', 'file', 'audio', 'sampling_rate', 'duration'], num_rows: 205 }) }) ## 引用信息 @inproceedings{kulkarni2023clartts, author={Ajinkya Kulkarni and Atharva Kulkarni and Sara Abedalmon'em Mohammad Shatnawi and Hanan Aldarmaki}, title={ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus}, year={2023}, booktitle={2023 INTERSPEECH}, pages={5511--5515}, doi={10.21437/Interspeech.2023-2224} }
提供机构:
maas
创建时间:
2025-03-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作