Elyadata/TunArTTS
收藏Hugging Face2024-11-26 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Elyadata/TunArTTS
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-4.0
task_categories:
- text-to-speech
pretty_name: TunArTTS
---
# Dataset Description:
This speech corpus is extracted from an online English-Tunisian Arabic dictionary Derja Ninja, providing a valuable resource for linguistic and speech-related research.
The dataset contains over 3 hours of mono-speaker audio recordings from a male speaker, sampled at 44.1 kHz.
Key characteristics of the corpus include:
- **Language**: Tunisian Arabic.
- **Speaker**: Single male speaker.
- **Sampling Rate**: High-quality recordings at 44.1 kHz.
- **Manual Diacritization**: All text has been processed and manually diacritized, ensuring phonetic accuracy for Tunisian Arabic.
This corpus is well-suited for applications such as speech synthesis and automatic speech recognition.
# Dataset Characteristics
| **Characteristic** | **Value** |
|-----------------------------|--------------------------------|
| Total Segments | 1493 |
| Total Words | 20925 |
| Total Characters | 113221 |
| Total Duration | 3 hours and 32 seconds |
| Mean Clip Duration | 7.24 seconds |
| Min Clip Duration | 3.11 seconds |
| Max Clip Duration | 16.3 seconds |
| Mean Words per Clip | 14.015 |
| Distinct Words | 4491 |
A research paper based on this dataset has been published. You can find the paper here: [https://aclanthology.org/2024.lrec-main.1467.pdf](#).
提供机构:
Elyadata



