ahmedjaved812/urdu-tts-corpus
收藏Hugging Face2026-04-09 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/ahmedjaved812/urdu-tts-corpus
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: hash_id
dtype: string
- name: text
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: duration_ms
dtype: int64
- name: speaker_embeddings
list: float64
- name: src
dtype: string
splits:
- name: train
num_bytes: 9410474987.721
num_examples: 122313
download_size: 9219789160
dataset_size: 9410474987.721
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
task_categories:
- text-to-speech
- text-to-audio
language:
- ur
pretty_name: Urdu TTS Corpus
size_categories:
- 100K<n<1M
---
## Urdu TTS Corpus
This dataset is a curated collection of Urdu speech-text pairs, designed for training Text-to-Speech (TTS) and Automatic Speech Recognition (ASR) models. It consolidates multiple high-quality sources into a standardized format.
### Dataset Description
- **Language:** Urdu (ur-PK)
- **Sampling Rate:** 16,000 Hz
- **Format:** Hugging Face datasets (Audio + Text)
- **Total Sources:** 4
### Source Attrbution
This corpus is a merger of the following datasets:
1. **gondal_urdu_tts:** [muhammadsaadgondal/urdu-tts](https://huggingface.co/datasets/muhammadsaadgondal/urdu-tts/)
2. **urdu_tts_16k:** [codewithdark/urdu-tts-16000Hz](https://huggingface.co/datasets/codewithdark/urdu-tts-16000Hz/)
3. **mozilla_cv_urdu_24:** [Mozilla Foundation](https://datacollective.mozillafoundation.org/datasets/cmj8u3pz600t9nxxbz9l2ck2n)
4. **urdu_tts_fast:** [codewithdark/urdu-tts-fast](https://huggingface.co/datasets/codewithdark/urdu-tts-fast)
提供机构:
ahmedjaved812



