kdcyberdude/Punjabi_ASR_datasets

Name: kdcyberdude/Punjabi_ASR_datasets
Creator: kdcyberdude
Published: 2024-05-15 11:04:34
License: 暂无描述

Hugging Face2024-05-15 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/kdcyberdude/Punjabi_ASR_datasets

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: speaker_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: gender dtype: string - name: duration dtype: float64 splits: - name: CMU_Synth_ASR__train num_bytes: 13675334135.0 num_examples: 18179 - name: CMU_Synth_ASR__test num_bytes: 3184547291.0 num_examples: 7971 - name: CMU_Synth_ASR__valid num_bytes: 2693990107.0 num_examples: 7971 - name: Common_Voice_16_1_pa_IN_ASR__train num_bytes: 371182890.0 num_examples: 166 - name: Common_Voice_16_1_pa_IN_ASR__test num_bytes: 17077660.0 num_examples: 487 - name: Common_Voice_16_1_pa_IN_ASR__validation num_bytes: 9769806.0 num_examples: 286 - name: fleurs_pa_ASR__train num_bytes: 733609139.625 num_examples: 1923 - name: fleurs_pa_ASR__valid num_bytes: 86172346.0 num_examples: 251 - name: fleurs_pa_ASR__test num_bytes: 211794695.0 num_examples: 574 - name: Google_Synth_ASR__train num_bytes: 4737066155.372 num_examples: 4589 - name: Google_Synth_ASR__test num_bytes: 906231911.0 num_examples: 5000 - name: Google_Synth_ASR__valid num_bytes: 850094029.0 num_examples: 4999 - name: indicsuperb_pa_ASR_filtered__train num_bytes: 15067126022.5 num_examples: 22268 - name: indicsuperb_pa_ASR_filtered__test num_bytes: 1504487683.5 num_examples: 7468 - name: Indicvoice_pa_ASR__train num_bytes: 8822867262.5 num_examples: 17300 - name: Indicvoice_pa_ASR__valid num_bytes: 50023391.0 num_examples: 95 - name: PunjabiSpeech_A_labeled_Speech_Corpus_ASR__train num_bytes: 1099202639.0 num_examples: 468 - name: PunjabiSpeech_A_labeled_Speech_Corpus_ASR__valid num_bytes: 124658452.0 num_examples: 243 - name: PunjabiSpeech_A_labeled_Speech_Corpus_ASR__test num_bytes: 124551489.0 num_examples: 243 - name: shrutilipi_pa_ASR_filtered__train num_bytes: 7852687829.125 num_examples: 27991 - name: shrutilipi_pa_ASR_filtered__test num_bytes: 321221358.125 num_examples: 1159 download_size: 59878729794 dataset_size: 62443696291.747 configs: - config_name: default data_files: - split: CMU_Synth_ASR__train path: data/CMU_Synth_ASR__train-* - split: CMU_Synth_ASR__test path: data/CMU_Synth_ASR__test-* - split: CMU_Synth_ASR__valid path: data/CMU_Synth_ASR__valid-* - split: Common_Voice_16_1_pa_IN_ASR__train path: data/Common_Voice_16_1_pa_IN_ASR__train-* - split: Common_Voice_16_1_pa_IN_ASR__test path: data/Common_Voice_16_1_pa_IN_ASR__test-* - split: Common_Voice_16_1_pa_IN_ASR__validation path: data/Common_Voice_16_1_pa_IN_ASR__validation-* - split: fleurs_pa_ASR__train path: data/fleurs_pa_ASR__train-* - split: fleurs_pa_ASR__valid path: data/fleurs_pa_ASR__valid-* - split: fleurs_pa_ASR__test path: data/fleurs_pa_ASR__test-* - split: Google_Synth_ASR__train path: data/Google_Synth_ASR__train-* - split: Google_Synth_ASR__test path: data/Google_Synth_ASR__test-* - split: Google_Synth_ASR__valid path: data/Google_Synth_ASR__valid-* - split: indicsuperb_pa_ASR_filtered__train path: data/indicsuperb_pa_ASR_filtered__train-* - split: indicsuperb_pa_ASR_filtered__test path: data/indicsuperb_pa_ASR_filtered__test-* - split: Indicvoice_pa_ASR__train path: data/Indicvoice_pa_ASR__train-* - split: Indicvoice_pa_ASR__valid path: data/Indicvoice_pa_ASR__valid-* - split: PunjabiSpeech_A_labeled_Speech_Corpus_ASR__train path: data/PunjabiSpeech_A_labeled_Speech_Corpus_ASR__train-* - split: PunjabiSpeech_A_labeled_Speech_Corpus_ASR__valid path: data/PunjabiSpeech_A_labeled_Speech_Corpus_ASR__valid-* - split: PunjabiSpeech_A_labeled_Speech_Corpus_ASR__test path: data/PunjabiSpeech_A_labeled_Speech_Corpus_ASR__test-* - split: shrutilipi_pa_ASR_filtered__train path: data/shrutilipi_pa_ASR_filtered__train-* - split: shrutilipi_pa_ASR_filtered__test path: data/shrutilipi_pa_ASR_filtered__test-* ---

提供机构：

kdcyberdude

原始信息汇总

数据集概述

数据集特征

speaker_id: 数据类型为字符串。
audio: 数据类型为音频，采样率为16000。
text: 数据类型为字符串。
gender: 数据类型为字符串。
duration: 数据类型为浮点数。

数据集分割

CMU_Synth_ASR__train: 18179个样本，大小为13675334135字节。
CMU_Synth_ASR__test: 7971个样本，大小为3184547291字节。
CMU_Synth_ASR__valid: 7971个样本，大小为2693990107字节。
Common_Voice_16_1_pa_IN_ASR__train: 166个样本，大小为371182890字节。
Common_Voice_16_1_pa_IN_ASR__test: 487个样本，大小为17077660字节。
Common_Voice_16_1_pa_IN_ASR__validation: 286个样本，大小为9769806字节。
fleurs_pa_ASR__train: 1923个样本，大小为733609139.625字节。
fleurs_pa_ASR__valid: 251个样本，大小为86172346字节。
fleurs_pa_ASR__test: 574个样本，大小为211794695字节。
Google_Synth_ASR__train: 4589个样本，大小为4737066155.372字节。
Google_Synth_ASR__test: 5000个样本，大小为906231911字节。
Google_Synth_ASR__valid: 4999个样本，大小为850094029字节。
indicsuperb_pa_ASR_filtered__train: 22268个样本，大小为15067126022.5字节。
indicsuperb_pa_ASR_filtered__test: 7468个样本，大小为1504487683.5字节。
Indicvoice_pa_ASR__train: 17300个样本，大小为8822867262.5字节。
Indicvoice_pa_ASR__valid: 95个样本，大小为50023391字节。
PunjabiSpeech_A_labeled_Speech_Corpus_ASR__train: 468个样本，大小为1099202639字节。
PunjabiSpeech_A_labeled_Speech_Corpus_ASR__valid: 243个样本，大小为124658452字节。
PunjabiSpeech_A_labeled_Speech_Corpus_ASR__test: 243个样本，大小为124551489字节。
shrutilipi_pa_ASR_filtered__train: 27991个样本，大小为7852687829.125字节。
shrutilipi_pa_ASR_filtered__test: 1159个样本，大小为321221358.125字节。

数据集大小

下载大小: 59878729794字节。
数据集大小: 62443696291.747字节。

配置文件

config_name: default
- 包含多个分割的数据文件路径，如CMU_Synth_ASR__train、CMU_Synth_ASR__test等，路径格式为data/分割名-*。

5,000+

优质数据集

54 个

任务类型

进入经典数据集