five

Khurram123/Kulliat-e-Iqbal-TTS

收藏
Hugging Face2026-03-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Khurram123/Kulliat-e-Iqbal-TTS
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 language: - ur tags: - urdu - tts - allama-iqbal - shaheen-ai pretty_name: Kulliat-e-Iqbal TTS Dataset size_categories: - 1K<n<10K dataset_info: features: - name: audio dtype: audio - name: text dtype: string - name: file_name dtype: string splits: - name: train num_examples: 9336 --- <p align="center"> <br> <b style="font-size: 24px;">"تری خودی میں اگر انقلاب ہو پیدا"</b> <br> <b style="font-size: 18px;">— کلیاتِ اقبال: اردو صوتیاتی ڈیٹا سیٹ برائے مصنوعی ذہانت</b> </p> <p align="center"> <img src="logo.png" width="300" alt="Kulliat-e-Iqbal TTS Dataset Logo"> </p> <h1 align="center">🎙️ Kulliat-e-Iqbal-TTS Dataset (v1.0) 🇵🇰</h1> **Kulliat-e-Iqbal-TTS** is a high-fidelity Urdu speech dataset specifically curated for training and fine-tuning Text-to-Speech (TTS) models. Part of the **Shaheen AI Project**, it comprises **9,336 high-quality audio samples** paired with precise transcriptions from the masterworks of **Allama Muhammad Iqbal**. This dataset is designed to enable modern neural speech synthesis models to recite classical Urdu poetry with authentic rhythmic structures and linguistic nuances. --- ## 🌟 Key Highlights - **Literary Breadth:** Includes verses from *Armaghan-e-Hijaz*, *Bal-e-Jibril*, *Zarb-e-Kaleem*, and the *Iqbal TTS* corpus. - **TTS Optimized:** All audio normalized to **22050Hz, Mono, 16-bit PCM** for out-of-the-box compatibility with VITS, Piper, Glow-TTS, and Coqui. - **Linguistic Precision:** Cleaned transcriptions utilizing the specialized vocabulary of Iqbaliyat to ensure high-quality prosody. - **Large Scale:** Approximately 11.8 hours of validated speech data, making it a premier open-source resource for Urdu poetic TTS. --- ## 📊 Dataset Composition The dataset is a consolidated and cleaned merge of four major literary sources: | Category | Source | Samples | Focus | | :--- | :--- | :--- | :--- | | **Armaghan-e-Hijaz** | Armaghan Auto-Whisper | 516 | Persian & Urdu Quatrains | | **Bal-e-Jibril** | Bal-Jibril Corrected | 2,183 | High-energy Ghazals | | **Iqbal TTS** | General Collection | 4,987 | Rhythmic Verse Recitation | | **Zarb-e-Kaleem** | Zarb Corrected | 1,650 | Philosophical Declarative Prose | --- ## 🛠️ Technical Specifications - **Total Rows:** 9,336 - **Format:** `file_name,text` (Standard CSV) - **Audio Codec:** WAV (PCM) - **Sampling Rate:** 22,050 Hz - **Bit Depth:** 16-bit - **Channels:** Mono (Single Channel) - **Language Code:** `ur` (Urdu) --- ## 💎 Features for Researchers | Feature | Detail | | :--- | :--- | | **Total Duration** | ~11.8 Hours | | **Average Sample Length** | 4.2 Seconds | | **Metadata Delimiter** | Comma (,) with Headers | | **Script Support** | Full Nastaliq / UTF-8 Encoding | --- ## 🚀 Quick Start (Hugging Face Datasets) To use this dataset in your Python environment: ```python from datasets import load_dataset # Load the dataset from Khurram123 dataset = load_dataset("Khurram123/Kulliat-e-Iqbal-TTS") # Preview the first sample print(dataset['train'][0]) # To listen to audio in a Jupyter Notebook: import IPython.display as ipd ipd.Audio(dataset['train'][0]['audio']['array'], rate=22050)
提供机构:
Khurram123
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作