five

somu9/iisc_mono_hindi_female

收藏
Hugging Face2026-04-15 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/somu9/iisc_mono_hindi_female
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - hi license: cc-by-4.0 task_categories: - text-to-speech tags: - hindi - tts - single-speaker - female - iisc - syspin - studio-quality size_categories: - 10K<n<100K dataset_info: features: - name: audio dtype: audio - name: text dtype: string - name: domain dtype: string - name: speaker_id dtype: string - name: language dtype: string splits: - name: train num_examples: 21662 - name: test num_examples: 396 --- # IISc Mono Hindi Female Studio-quality single-speaker Hindi female TTS dataset from the SYSPIN project by Indian Institute of Science (IISc), Bengaluru. ## Dataset Description | Property | Value | |---|---| | **Source** | [IISc SYSPIN Project](https://syspin.iisc.ac.in/) | | **Speaker** | Single professional female voice artist (42 yrs, 21 yrs experience) | | **Language** | Hindi (hi) | | **Total Duration** | 54 hours 54 minutes 44 seconds | | **Utterances** | 22,058 (train: 21,662 / test: 396 EVAL domain) | | **Audio** | 48kHz, 24-bit, mono, embedded in parquet | | **Recording** | Neumann TLM-103 microphone, professional studio, ~40dB SNR | | **Domains** | Agriculture, Books, Education, Evaluation, Finance, General, Health, Others, Politics, Weather | ## Domain Distribution | Domain | Hours | Sentences | Description | |---|---|---|---| | BOOK | 23:03:20 | 8,358 | Books | | OTHE | 7:21:25 | 3,081 | Others | | GENE | 6:00:43 | 2,540 | General | | EDUC | 5:04:10 | 2,060 | Education | | WEAT | 4:22:58 | 1,873 | Weather | | POLI | 2:57:51 | 1,235 | Politics | | AGRI | 1:48:18 | 848 | Agriculture | | HEAL | 1:48:31 | 835 | Health | | FINA | 1:46:45 | 832 | Finance | | EVAL | 0:40:38 | 396 | Evaluation (test set) | ## Fields | Column | Type | Description | |---|---|---| | `audio` | Audio (48kHz) | Speech waveform | | `text` | string | Hindi transcription (Devanagari) | | `domain` | string | Content domain (BOOK, GENE, etc.) | | `speaker_id` | string | `hindi_female_spk001` | | `language` | string | `hi` | ## Splits - **train**: 21,662 utterances (all domains except EVAL) - **test**: 396 utterances (EVAL domain — recommended by creators for TTS evaluation) ## Usage ```python from datasets import load_dataset ds = load_dataset("somu9/iisc_mono_hindi_female", split="train") # Listen to first sample print(ds[0]["text"]) audio = ds[0]["audio"] # {"array": np.array, "sampling_rate": 48000} ``` ## Speaker Metadata - **Language:** Hindi - **Gender:** Female - **Age:** 42 - **Experience:** 21 Years - **Languages known:** Hindi, English, Tamil - **Mother tongue:** Hindi ## Recording Setup - **Microphone:** Neumann TLM-103 - **Environment:** Professional studio - **Conditions:** Studio quality at ~40dB SNR ## License This dataset is released under the [CC-BY-4.0 license](https://creativecommons.org/licenses/by/4.0/legalcode.en). TTS data created under SYSPIN project by Indian Institute of Science, Bengaluru. The copyright in the TTS data belongs to Indian Institute of Science, Bengaluru. ## Acknowledgments We extend our heartfelt gratitude to the talented voice artist whose contributions were fundamental to this project's success. We are particularly grateful to the project of German Development Cooperation "FAIR Forward - AI for All" for their financial support in developing this TTS corpus, and Bhashini AI Solutions Private Limited for their financial support for part of the corpus beyond 44 hours for every voice artist in developing this TTS corpus. ## Citation ```bibtex @misc{SYSPIN_S1.0_Corpus, Title = {SYSPIN_S1.0 Corpus - A TTS Corpus of 900+ hours in nine Indian Languages}, Authors = {Abhayjeet Et al.}, Year = {2025} } ``` ## Contact SPIRE Lab, EE Dept., IISc, Bengaluru Email: contact.syspin@iisc.ac.in
提供机构:
somu9
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作