Paytmlabs/S2R_Shrutilipi_hindi
收藏Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Paytmlabs/S2R_Shrutilipi_hindi
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- hi
license: unknown
pretty_name: S2R Shrutilipi Hindi
task_categories:
- automatic-speech-recognition
tags:
- audio
- speech
- hindi
- automatic-speech-recognition
configs:
- config_name: hindi_text_samples
data_files:
- split: train
path: data/preview/train-00000-of-00001.parquet
- config_name: hindi
data_files:
- split: train
path: data/train/train-*.parquet
- split: validation
path: data/validation/validation-*.parquet
---
# Paytmlabs/S2R_Shrutilipi_hindi
Hindi speech dataset prepared from [ai4bharat/Shrutilipi](https://huggingface.co/datasets/ai4bharat/Shrutilipi) for Ultravox training.
## Viewing samples on Hugging Face
The **`hindi`** config stores **audio** inside Parquet. The website **dataset viewer** often cannot decode that and shows **no rows**.
To inspect examples in the browser, open the **Subset** (config) drop-down and choose **`hindi_text_samples`** — text and continuation only (~2000 rows).
**Ultravox training** should keep using subset **`hindi`** (full audio).
## Schema (config: hindi)
| Column | Type | Description |
|---|---|---|
| `audio` | Audio | Speech audio |
| `text` | string | Verbatim transcript |
| `continuation` | string | LLM-generated continuation (≤50 words) |
## Progress
- Train chunks: 73/73
- Validation: done
提供机构:
Paytmlabs



