five

surindersinghssj/gurbani-asr-v2-test

收藏
Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/surindersinghssj/gurbani-asr-v2-test
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: audio dtype: audio: sampling_rate: 16000 - name: sentence dtype: string - name: whisper_raw dtype: string - name: canonical_line dtype: string - name: word_mapping dtype: string - name: segment_id dtype: string - name: recording_id dtype: string - name: mapped_ratio dtype: float32 - name: line_match_method dtype: string - name: tuk_index dtype: int32 - name: start dtype: float32 - name: end dtype: float32 - name: duration dtype: float32 - name: avg_confidence dtype: float32 - name: pipeline dtype: string - name: ang dtype: int32 - name: style_bucket dtype: string - name: artist_name dtype: string - name: shabad_id dtype: int64 - name: raag dtype: string - name: writer dtype: string splits: - name: train_0ddbc5f609d8cfbc num_bytes: 13991985 num_examples: 88 - name: train_122e62cf9d6207ab num_bytes: 9681683 num_examples: 73 - name: train_0bfc4bdb73a271fc num_bytes: 4463939 num_examples: 25 - name: train_9a2c6bf5c74f6d1e num_bytes: 18731407 num_examples: 181 - name: train_a366094c28224cf2 num_bytes: 12276974 num_examples: 119 - name: train_7dd95734d50d486e num_bytes: 8832083 num_examples: 54 - name: train_81f6d8ffba8c5c49 num_bytes: 15721489 num_examples: 132 - name: train_4f0569b09238ff63 num_bytes: 14663399 num_examples: 103 - name: train_99321a199e5fe0b4 num_bytes: 24664613.0 num_examples: 165 - name: train_2de05fd07b63ef79 num_bytes: 37269153.0 num_examples: 277 download_size: 159862899 dataset_size: 160296725.0 configs: - config_name: default data_files: - split: train_0ddbc5f609d8cfbc path: data/train_0ddbc5f609d8cfbc-* - split: train_122e62cf9d6207ab path: data/train_122e62cf9d6207ab-* - split: train_0bfc4bdb73a271fc path: data/train_0bfc4bdb73a271fc-* - split: train_9a2c6bf5c74f6d1e path: data/train_9a2c6bf5c74f6d1e-* - split: train_a366094c28224cf2 path: data/train_a366094c28224cf2-* - split: train_7dd95734d50d486e path: data/train_7dd95734d50d486e-* - split: train_81f6d8ffba8c5c49 path: data/train_81f6d8ffba8c5c49-* - split: train_4f0569b09238ff63 path: data/train_4f0569b09238ff63-* - split: train_99321a199e5fe0b4 path: data/train_99321a199e5fe0b4-* - split: train_2de05fd07b63ef79 path: data/train_2de05fd07b63ef79-* language: - pa task_categories: - automatic-speech-recognition --- # Gurbani ASR v2 Test Dataset Forced-aligned Gurbani kirtan audio segments for training a Gurbani-only ASR model. ## Contents Each row is a single tuk (line of Gurbani) extracted from SikhNet kirtan recordings via forced alignment against canonical STTM text. | Column | Description | |--------|-------------| | `audio` | 16kHz mono audio segment | | `sentence` | Ground-truth Gurmukhi text (from STTM database) | | `whisper_raw` | Raw Whisper transcription (before alignment) | | `canonical_line` | Canonical tuk from STTM | | `word_mapping` | Word-level alignment mapping | | `mapped_ratio` | Fraction of canonical words matched | | `start` / `end` / `duration` | Segment timestamps (seconds) | | `avg_confidence` | Mean Whisper word confidence | | `pipeline` | Alignment pipeline used | | `ang` | Ang (page) number in Sri Guru Granth Sahib Ji | | `shabad_id` | STTM shabad ID | | `raag` | Raag of the shabad | | `writer` | Author (Guru/Bhagat) | | `artist_name` | Kirtan artist | | `style_bucket` | Singing style category |
提供机构:
surindersinghssj
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作