Elyadata/TunArTTS

Name: Elyadata/TunArTTS
Creator: Elyadata
Published: 2024-11-26 14:49:22
License: 暂无描述

Hugging Face2024-11-26 更新2025-04-26 收录

下载链接：

https://hf-mirror.com/datasets/Elyadata/TunArTTS

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-4.0 task_categories: - text-to-speech pretty_name: TunArTTS --- # Dataset Description: This speech corpus is extracted from an online English-Tunisian Arabic dictionary Derja Ninja, providing a valuable resource for linguistic and speech-related research. The dataset contains over 3 hours of mono-speaker audio recordings from a male speaker, sampled at 44.1 kHz. Key characteristics of the corpus include: - **Language**: Tunisian Arabic. - **Speaker**: Single male speaker. - **Sampling Rate**: High-quality recordings at 44.1 kHz. - **Manual Diacritization**: All text has been processed and manually diacritized, ensuring phonetic accuracy for Tunisian Arabic. This corpus is well-suited for applications such as speech synthesis and automatic speech recognition. # Dataset Characteristics | **Characteristic** | **Value** | |-----------------------------|--------------------------------| | Total Segments | 1493 | | Total Words | 20925 | | Total Characters | 113221 | | Total Duration | 3 hours and 32 seconds | | Mean Clip Duration | 7.24 seconds | | Min Clip Duration | 3.11 seconds | | Max Clip Duration | 16.3 seconds | | Mean Words per Clip | 14.015 | | Distinct Words | 4491 | A research paper based on this dataset has been published. You can find the paper here: [https://aclanthology.org/2024.lrec-main.1467.pdf](#).

提供机构：

Elyadata

5,000+

优质数据集

54 个

任务类型

进入经典数据集