surindersinghssj/gurbani-kirtan-asr
收藏Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/surindersinghssj/gurbani-kirtan-asr
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: audio
dtype: audio
- name: transcription
dtype: string
- name: source
dtype: string
- name: speaker_id
dtype: string
- name: video_id
dtype: string
- name: caption_type
dtype: string
- name: start
dtype: float64
- name: end
dtype: float64
- name: duration_sec
dtype: float64
splits:
- name: train
num_examples: 40500
configs:
- config_name: default
data_files:
- split: train
path: data/train-*.parquet
language:
- pa
license: cc-by-4.0
task_categories:
- automatic-speech-recognition
tags:
- audio
- speech
- gurmukhi
- punjabi
- kirtan
- gurbani
- asr
pretty_name: Gurbani Kirtan ASR
size_categories:
- 10K<n<100K
---
# Gurbani Kirtan ASR Dataset
A high-quality Punjabi (Gurmukhi) speech-to-text dataset built from YouTube kirtan recordings with human-verified and auto-generated Punjabi captions.
## Dataset Details
- **Source:** 128 curated YouTube kirtan videos from 17 artists
- **Language:** Punjabi / Gurmukhi script
- **Audio format:** FLAC, 16 kHz, mono
- **Caption types:** Manual (87 videos) and auto-generated (41 videos)
- **Segment filter:** Gurmukhi script ratio >= 0.65, duration 0.5-30 seconds
- **Shard size:** 500 segments per parquet file
## Schema
| Column | Type | Description |
|--------|------|-------------|
| audio | Audio | 16 kHz mono FLAC audio |
| transcription | string | Gurmukhi script transcription |
| source | string | Always "youtube" |
| speaker_id | string | Artist/channel name |
| video_id | string | YouTube video ID |
| caption_type | string | "manual" or "auto" |
| start | float64 | Segment start time (seconds) |
| end | float64 | Segment end time (seconds) |
| duration_sec | float64 | Segment duration (seconds) |
## Artists
Bhai Manpreet Singh, Amritt Saagar, Bhai Harjinder Singh (Sri Nagar Wale), Bhai Gurpreet Singh Shimla Wale, Bhai Lakhwinder Singh Hazoori Ragi, Bhai Jujhar Singh Hazuri Ragi, Bhai Satvinder & Harvinder Singh Delhi Wale, SGPC Golden Temple, Bhai Niranjan Singh Jawaddi Kalan, Red Records, Ajit Brar Vlogs, WSYA Sikhi, and more.
## Usage
```python
from datasets import load_dataset
ds = load_dataset("surindersinghssj/gurbani-kirtan-asr")
print(ds["train"][0])
```
## Related Datasets
- [gurbani-asr](https://huggingface.co/datasets/surindersinghssj/gurbani-asr) - Sehaj Path (Sri Guru Granth Sahib recitation) ASR dataset
提供机构:
surindersinghssj



