five

Elyadata/TunArTTS

收藏
Hugging Face2024-11-26 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Elyadata/TunArTTS
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-4.0 task_categories: - text-to-speech pretty_name: TunArTTS --- # Dataset Description: This speech corpus is extracted from an online English-Tunisian Arabic dictionary Derja Ninja, providing a valuable resource for linguistic and speech-related research. The dataset contains over 3 hours of mono-speaker audio recordings from a male speaker, sampled at 44.1 kHz. Key characteristics of the corpus include: - **Language**: Tunisian Arabic. - **Speaker**: Single male speaker. - **Sampling Rate**: High-quality recordings at 44.1 kHz. - **Manual Diacritization**: All text has been processed and manually diacritized, ensuring phonetic accuracy for Tunisian Arabic. This corpus is well-suited for applications such as speech synthesis and automatic speech recognition. # Dataset Characteristics | **Characteristic** | **Value** | |-----------------------------|--------------------------------| | Total Segments | 1493 | | Total Words | 20925 | | Total Characters | 113221 | | Total Duration | 3 hours and 32 seconds | | Mean Clip Duration | 7.24 seconds | | Min Clip Duration | 3.11 seconds | | Max Clip Duration | 16.3 seconds | | Mean Words per Clip | 14.015 | | Distinct Words | 4491 | A research paper based on this dataset has been published. You can find the paper here: [https://aclanthology.org/2024.lrec-main.1467.pdf](#).
提供机构:
Elyadata
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作