Khurram123/Kulliat-e-Iqbal-TTS
收藏Hugging Face2026-03-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Khurram123/Kulliat-e-Iqbal-TTS
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
language:
- ur
tags:
- urdu
- tts
- allama-iqbal
- shaheen-ai
pretty_name: Kulliat-e-Iqbal TTS Dataset
size_categories:
- 1K<n<10K
dataset_info:
features:
- name: audio
dtype: audio
- name: text
dtype: string
- name: file_name
dtype: string
splits:
- name: train
num_examples: 9336
---
<p align="center">
<br>
<b style="font-size: 24px;">"تری خودی میں اگر انقلاب ہو پیدا"</b> <br>
<b style="font-size: 18px;">— کلیاتِ اقبال: اردو صوتیاتی ڈیٹا سیٹ برائے مصنوعی ذہانت</b>
</p>
<p align="center">
<img src="logo.png" width="300" alt="Kulliat-e-Iqbal TTS Dataset Logo">
</p>
<h1 align="center">🎙️ Kulliat-e-Iqbal-TTS Dataset (v1.0) 🇵🇰</h1>
**Kulliat-e-Iqbal-TTS** is a high-fidelity Urdu speech dataset specifically curated for training and fine-tuning Text-to-Speech (TTS) models. Part of the **Shaheen AI Project**, it comprises **9,336 high-quality audio samples** paired with precise transcriptions from the masterworks of **Allama Muhammad Iqbal**.
This dataset is designed to enable modern neural speech synthesis models to recite classical Urdu poetry with authentic rhythmic structures and linguistic nuances.
---
## 🌟 Key Highlights
- **Literary Breadth:** Includes verses from *Armaghan-e-Hijaz*, *Bal-e-Jibril*, *Zarb-e-Kaleem*, and the *Iqbal TTS* corpus.
- **TTS Optimized:** All audio normalized to **22050Hz, Mono, 16-bit PCM** for out-of-the-box compatibility with VITS, Piper, Glow-TTS, and Coqui.
- **Linguistic Precision:** Cleaned transcriptions utilizing the specialized vocabulary of Iqbaliyat to ensure high-quality prosody.
- **Large Scale:** Approximately 11.8 hours of validated speech data, making it a premier open-source resource for Urdu poetic TTS.
---
## 📊 Dataset Composition
The dataset is a consolidated and cleaned merge of four major literary sources:
| Category | Source | Samples | Focus |
| :--- | :--- | :--- | :--- |
| **Armaghan-e-Hijaz** | Armaghan Auto-Whisper | 516 | Persian & Urdu Quatrains |
| **Bal-e-Jibril** | Bal-Jibril Corrected | 2,183 | High-energy Ghazals |
| **Iqbal TTS** | General Collection | 4,987 | Rhythmic Verse Recitation |
| **Zarb-e-Kaleem** | Zarb Corrected | 1,650 | Philosophical Declarative Prose |
---
## 🛠️ Technical Specifications
- **Total Rows:** 9,336
- **Format:** `file_name,text` (Standard CSV)
- **Audio Codec:** WAV (PCM)
- **Sampling Rate:** 22,050 Hz
- **Bit Depth:** 16-bit
- **Channels:** Mono (Single Channel)
- **Language Code:** `ur` (Urdu)
---
## 💎 Features for Researchers
| Feature | Detail |
| :--- | :--- |
| **Total Duration** | ~11.8 Hours |
| **Average Sample Length** | 4.2 Seconds |
| **Metadata Delimiter** | Comma (,) with Headers |
| **Script Support** | Full Nastaliq / UTF-8 Encoding |
---
## 🚀 Quick Start (Hugging Face Datasets)
To use this dataset in your Python environment:
```python
from datasets import load_dataset
# Load the dataset from Khurram123
dataset = load_dataset("Khurram123/Kulliat-e-Iqbal-TTS")
# Preview the first sample
print(dataset['train'][0])
# To listen to audio in a Jupyter Notebook:
import IPython.display as ipd
ipd.Audio(dataset['train'][0]['audio']['array'], rate=22050)
提供机构:
Khurram123



