surindersinghssj/gurbani-asr-v2-test
收藏Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/surindersinghssj/gurbani-asr-v2-test
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: sentence
dtype: string
- name: whisper_raw
dtype: string
- name: canonical_line
dtype: string
- name: word_mapping
dtype: string
- name: segment_id
dtype: string
- name: recording_id
dtype: string
- name: mapped_ratio
dtype: float32
- name: line_match_method
dtype: string
- name: tuk_index
dtype: int32
- name: start
dtype: float32
- name: end
dtype: float32
- name: duration
dtype: float32
- name: avg_confidence
dtype: float32
- name: pipeline
dtype: string
- name: ang
dtype: int32
- name: style_bucket
dtype: string
- name: artist_name
dtype: string
- name: shabad_id
dtype: int64
- name: raag
dtype: string
- name: writer
dtype: string
splits:
- name: train_0ddbc5f609d8cfbc
num_bytes: 13991985
num_examples: 88
- name: train_122e62cf9d6207ab
num_bytes: 9681683
num_examples: 73
- name: train_0bfc4bdb73a271fc
num_bytes: 4463939
num_examples: 25
- name: train_9a2c6bf5c74f6d1e
num_bytes: 18731407
num_examples: 181
- name: train_a366094c28224cf2
num_bytes: 12276974
num_examples: 119
- name: train_7dd95734d50d486e
num_bytes: 8832083
num_examples: 54
- name: train_81f6d8ffba8c5c49
num_bytes: 15721489
num_examples: 132
- name: train_4f0569b09238ff63
num_bytes: 14663399
num_examples: 103
- name: train_99321a199e5fe0b4
num_bytes: 24664613.0
num_examples: 165
- name: train_2de05fd07b63ef79
num_bytes: 37269153.0
num_examples: 277
download_size: 159862899
dataset_size: 160296725.0
configs:
- config_name: default
data_files:
- split: train_0ddbc5f609d8cfbc
path: data/train_0ddbc5f609d8cfbc-*
- split: train_122e62cf9d6207ab
path: data/train_122e62cf9d6207ab-*
- split: train_0bfc4bdb73a271fc
path: data/train_0bfc4bdb73a271fc-*
- split: train_9a2c6bf5c74f6d1e
path: data/train_9a2c6bf5c74f6d1e-*
- split: train_a366094c28224cf2
path: data/train_a366094c28224cf2-*
- split: train_7dd95734d50d486e
path: data/train_7dd95734d50d486e-*
- split: train_81f6d8ffba8c5c49
path: data/train_81f6d8ffba8c5c49-*
- split: train_4f0569b09238ff63
path: data/train_4f0569b09238ff63-*
- split: train_99321a199e5fe0b4
path: data/train_99321a199e5fe0b4-*
- split: train_2de05fd07b63ef79
path: data/train_2de05fd07b63ef79-*
language:
- pa
task_categories:
- automatic-speech-recognition
---
# Gurbani ASR v2 Test Dataset
Forced-aligned Gurbani kirtan audio segments for training a Gurbani-only ASR model.
## Contents
Each row is a single tuk (line of Gurbani) extracted from SikhNet kirtan recordings via forced alignment against canonical STTM text.
| Column | Description |
|--------|-------------|
| `audio` | 16kHz mono audio segment |
| `sentence` | Ground-truth Gurmukhi text (from STTM database) |
| `whisper_raw` | Raw Whisper transcription (before alignment) |
| `canonical_line` | Canonical tuk from STTM |
| `word_mapping` | Word-level alignment mapping |
| `mapped_ratio` | Fraction of canonical words matched |
| `start` / `end` / `duration` | Segment timestamps (seconds) |
| `avg_confidence` | Mean Whisper word confidence |
| `pipeline` | Alignment pipeline used |
| `ang` | Ang (page) number in Sri Guru Granth Sahib Ji |
| `shabad_id` | STTM shabad ID |
| `raag` | Raag of the shabad |
| `writer` | Author (Guru/Bhagat) |
| `artist_name` | Kirtan artist |
| `style_bucket` | Singing style category |
提供机构:
surindersinghssj



