kdcyberdude/Punjabi_ASR_datasets
收藏Hugging Face2024-05-15 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/kdcyberdude/Punjabi_ASR_datasets
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: speaker_id
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: gender
dtype: string
- name: duration
dtype: float64
splits:
- name: CMU_Synth_ASR__train
num_bytes: 13675334135.0
num_examples: 18179
- name: CMU_Synth_ASR__test
num_bytes: 3184547291.0
num_examples: 7971
- name: CMU_Synth_ASR__valid
num_bytes: 2693990107.0
num_examples: 7971
- name: Common_Voice_16_1_pa_IN_ASR__train
num_bytes: 371182890.0
num_examples: 166
- name: Common_Voice_16_1_pa_IN_ASR__test
num_bytes: 17077660.0
num_examples: 487
- name: Common_Voice_16_1_pa_IN_ASR__validation
num_bytes: 9769806.0
num_examples: 286
- name: fleurs_pa_ASR__train
num_bytes: 733609139.625
num_examples: 1923
- name: fleurs_pa_ASR__valid
num_bytes: 86172346.0
num_examples: 251
- name: fleurs_pa_ASR__test
num_bytes: 211794695.0
num_examples: 574
- name: Google_Synth_ASR__train
num_bytes: 4737066155.372
num_examples: 4589
- name: Google_Synth_ASR__test
num_bytes: 906231911.0
num_examples: 5000
- name: Google_Synth_ASR__valid
num_bytes: 850094029.0
num_examples: 4999
- name: indicsuperb_pa_ASR_filtered__train
num_bytes: 15067126022.5
num_examples: 22268
- name: indicsuperb_pa_ASR_filtered__test
num_bytes: 1504487683.5
num_examples: 7468
- name: Indicvoice_pa_ASR__train
num_bytes: 8822867262.5
num_examples: 17300
- name: Indicvoice_pa_ASR__valid
num_bytes: 50023391.0
num_examples: 95
- name: PunjabiSpeech_A_labeled_Speech_Corpus_ASR__train
num_bytes: 1099202639.0
num_examples: 468
- name: PunjabiSpeech_A_labeled_Speech_Corpus_ASR__valid
num_bytes: 124658452.0
num_examples: 243
- name: PunjabiSpeech_A_labeled_Speech_Corpus_ASR__test
num_bytes: 124551489.0
num_examples: 243
- name: shrutilipi_pa_ASR_filtered__train
num_bytes: 7852687829.125
num_examples: 27991
- name: shrutilipi_pa_ASR_filtered__test
num_bytes: 321221358.125
num_examples: 1159
download_size: 59878729794
dataset_size: 62443696291.747
configs:
- config_name: default
data_files:
- split: CMU_Synth_ASR__train
path: data/CMU_Synth_ASR__train-*
- split: CMU_Synth_ASR__test
path: data/CMU_Synth_ASR__test-*
- split: CMU_Synth_ASR__valid
path: data/CMU_Synth_ASR__valid-*
- split: Common_Voice_16_1_pa_IN_ASR__train
path: data/Common_Voice_16_1_pa_IN_ASR__train-*
- split: Common_Voice_16_1_pa_IN_ASR__test
path: data/Common_Voice_16_1_pa_IN_ASR__test-*
- split: Common_Voice_16_1_pa_IN_ASR__validation
path: data/Common_Voice_16_1_pa_IN_ASR__validation-*
- split: fleurs_pa_ASR__train
path: data/fleurs_pa_ASR__train-*
- split: fleurs_pa_ASR__valid
path: data/fleurs_pa_ASR__valid-*
- split: fleurs_pa_ASR__test
path: data/fleurs_pa_ASR__test-*
- split: Google_Synth_ASR__train
path: data/Google_Synth_ASR__train-*
- split: Google_Synth_ASR__test
path: data/Google_Synth_ASR__test-*
- split: Google_Synth_ASR__valid
path: data/Google_Synth_ASR__valid-*
- split: indicsuperb_pa_ASR_filtered__train
path: data/indicsuperb_pa_ASR_filtered__train-*
- split: indicsuperb_pa_ASR_filtered__test
path: data/indicsuperb_pa_ASR_filtered__test-*
- split: Indicvoice_pa_ASR__train
path: data/Indicvoice_pa_ASR__train-*
- split: Indicvoice_pa_ASR__valid
path: data/Indicvoice_pa_ASR__valid-*
- split: PunjabiSpeech_A_labeled_Speech_Corpus_ASR__train
path: data/PunjabiSpeech_A_labeled_Speech_Corpus_ASR__train-*
- split: PunjabiSpeech_A_labeled_Speech_Corpus_ASR__valid
path: data/PunjabiSpeech_A_labeled_Speech_Corpus_ASR__valid-*
- split: PunjabiSpeech_A_labeled_Speech_Corpus_ASR__test
path: data/PunjabiSpeech_A_labeled_Speech_Corpus_ASR__test-*
- split: shrutilipi_pa_ASR_filtered__train
path: data/shrutilipi_pa_ASR_filtered__train-*
- split: shrutilipi_pa_ASR_filtered__test
path: data/shrutilipi_pa_ASR_filtered__test-*
---
提供机构:
kdcyberdude
原始信息汇总
数据集概述
数据集特征
- speaker_id: 数据类型为字符串。
- audio: 数据类型为音频,采样率为16000。
- text: 数据类型为字符串。
- gender: 数据类型为字符串。
- duration: 数据类型为浮点数。
数据集分割
- CMU_Synth_ASR__train: 18179个样本,大小为13675334135字节。
- CMU_Synth_ASR__test: 7971个样本,大小为3184547291字节。
- CMU_Synth_ASR__valid: 7971个样本,大小为2693990107字节。
- Common_Voice_16_1_pa_IN_ASR__train: 166个样本,大小为371182890字节。
- Common_Voice_16_1_pa_IN_ASR__test: 487个样本,大小为17077660字节。
- Common_Voice_16_1_pa_IN_ASR__validation: 286个样本,大小为9769806字节。
- fleurs_pa_ASR__train: 1923个样本,大小为733609139.625字节。
- fleurs_pa_ASR__valid: 251个样本,大小为86172346字节。
- fleurs_pa_ASR__test: 574个样本,大小为211794695字节。
- Google_Synth_ASR__train: 4589个样本,大小为4737066155.372字节。
- Google_Synth_ASR__test: 5000个样本,大小为906231911字节。
- Google_Synth_ASR__valid: 4999个样本,大小为850094029字节。
- indicsuperb_pa_ASR_filtered__train: 22268个样本,大小为15067126022.5字节。
- indicsuperb_pa_ASR_filtered__test: 7468个样本,大小为1504487683.5字节。
- Indicvoice_pa_ASR__train: 17300个样本,大小为8822867262.5字节。
- Indicvoice_pa_ASR__valid: 95个样本,大小为50023391字节。
- PunjabiSpeech_A_labeled_Speech_Corpus_ASR__train: 468个样本,大小为1099202639字节。
- PunjabiSpeech_A_labeled_Speech_Corpus_ASR__valid: 243个样本,大小为124658452字节。
- PunjabiSpeech_A_labeled_Speech_Corpus_ASR__test: 243个样本,大小为124551489字节。
- shrutilipi_pa_ASR_filtered__train: 27991个样本,大小为7852687829.125字节。
- shrutilipi_pa_ASR_filtered__test: 1159个样本,大小为321221358.125字节。
数据集大小
- 下载大小: 59878729794字节。
- 数据集大小: 62443696291.747字节。
配置文件
- config_name: default
- 包含多个分割的数据文件路径,如
CMU_Synth_ASR__train、CMU_Synth_ASR__test等,路径格式为data/分割名-*。
- 包含多个分割的数据文件路径,如



