five

aman-hf/indic_asr

收藏
Hugging Face2026-03-06 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/aman-hf/indic_asr
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - automatic-speech-recognition language: - hi2 size_categories: - 10M<n<100M configs: - config_name: default data_files: - split: train path: "data/*/*.parquet" - config_name: hi2 data_files: - split: train path: "data/hi2/*.parquet" dataset_info: features: - name: audio dtype: audio - name: text dtype: string - name: duration dtype: float64 - name: language dtype: string - name: source dtype: string --- # Indic ASR Unified Dataset Unified collection of Indian language ASR datasets for pretraining. ## Stats - **Total hours:** 10,278 - **Total samples:** 4,732,705 - **Languages:** 1 - **Audio:** 16kHz mono (mixed flac/mp3/wav) ## Languages | Language | Hours | Samples | |----------|-------|---------| | hi2 | 10,278 | 4,732,705 | ## Usage ```python from datasets import load_dataset # Load all languages (streaming) ds = load_dataset("aman-hf/indic_asr", streaming=True, split="train") # Load specific language ds_hi = load_dataset("aman-hf/indic_asr", "hi", streaming=True, split="train") ``` ## Schema - `audio`: Audio bytes (16kHz mono) - `text`: Transcription text - `duration`: Duration in seconds - `language`: ISO language code - `source`: Original dataset name
提供机构:
aman-hf
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作