five

kdcyberdude/Punjabi_ASR_datasets

收藏
Hugging Face2024-05-15 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/kdcyberdude/Punjabi_ASR_datasets
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: speaker_id dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: gender dtype: string - name: duration dtype: float64 splits: - name: CMU_Synth_ASR__train num_bytes: 13675334135.0 num_examples: 18179 - name: CMU_Synth_ASR__test num_bytes: 3184547291.0 num_examples: 7971 - name: CMU_Synth_ASR__valid num_bytes: 2693990107.0 num_examples: 7971 - name: Common_Voice_16_1_pa_IN_ASR__train num_bytes: 371182890.0 num_examples: 166 - name: Common_Voice_16_1_pa_IN_ASR__test num_bytes: 17077660.0 num_examples: 487 - name: Common_Voice_16_1_pa_IN_ASR__validation num_bytes: 9769806.0 num_examples: 286 - name: fleurs_pa_ASR__train num_bytes: 733609139.625 num_examples: 1923 - name: fleurs_pa_ASR__valid num_bytes: 86172346.0 num_examples: 251 - name: fleurs_pa_ASR__test num_bytes: 211794695.0 num_examples: 574 - name: Google_Synth_ASR__train num_bytes: 4737066155.372 num_examples: 4589 - name: Google_Synth_ASR__test num_bytes: 906231911.0 num_examples: 5000 - name: Google_Synth_ASR__valid num_bytes: 850094029.0 num_examples: 4999 - name: indicsuperb_pa_ASR_filtered__train num_bytes: 15067126022.5 num_examples: 22268 - name: indicsuperb_pa_ASR_filtered__test num_bytes: 1504487683.5 num_examples: 7468 - name: Indicvoice_pa_ASR__train num_bytes: 8822867262.5 num_examples: 17300 - name: Indicvoice_pa_ASR__valid num_bytes: 50023391.0 num_examples: 95 - name: PunjabiSpeech_A_labeled_Speech_Corpus_ASR__train num_bytes: 1099202639.0 num_examples: 468 - name: PunjabiSpeech_A_labeled_Speech_Corpus_ASR__valid num_bytes: 124658452.0 num_examples: 243 - name: PunjabiSpeech_A_labeled_Speech_Corpus_ASR__test num_bytes: 124551489.0 num_examples: 243 - name: shrutilipi_pa_ASR_filtered__train num_bytes: 7852687829.125 num_examples: 27991 - name: shrutilipi_pa_ASR_filtered__test num_bytes: 321221358.125 num_examples: 1159 download_size: 59878729794 dataset_size: 62443696291.747 configs: - config_name: default data_files: - split: CMU_Synth_ASR__train path: data/CMU_Synth_ASR__train-* - split: CMU_Synth_ASR__test path: data/CMU_Synth_ASR__test-* - split: CMU_Synth_ASR__valid path: data/CMU_Synth_ASR__valid-* - split: Common_Voice_16_1_pa_IN_ASR__train path: data/Common_Voice_16_1_pa_IN_ASR__train-* - split: Common_Voice_16_1_pa_IN_ASR__test path: data/Common_Voice_16_1_pa_IN_ASR__test-* - split: Common_Voice_16_1_pa_IN_ASR__validation path: data/Common_Voice_16_1_pa_IN_ASR__validation-* - split: fleurs_pa_ASR__train path: data/fleurs_pa_ASR__train-* - split: fleurs_pa_ASR__valid path: data/fleurs_pa_ASR__valid-* - split: fleurs_pa_ASR__test path: data/fleurs_pa_ASR__test-* - split: Google_Synth_ASR__train path: data/Google_Synth_ASR__train-* - split: Google_Synth_ASR__test path: data/Google_Synth_ASR__test-* - split: Google_Synth_ASR__valid path: data/Google_Synth_ASR__valid-* - split: indicsuperb_pa_ASR_filtered__train path: data/indicsuperb_pa_ASR_filtered__train-* - split: indicsuperb_pa_ASR_filtered__test path: data/indicsuperb_pa_ASR_filtered__test-* - split: Indicvoice_pa_ASR__train path: data/Indicvoice_pa_ASR__train-* - split: Indicvoice_pa_ASR__valid path: data/Indicvoice_pa_ASR__valid-* - split: PunjabiSpeech_A_labeled_Speech_Corpus_ASR__train path: data/PunjabiSpeech_A_labeled_Speech_Corpus_ASR__train-* - split: PunjabiSpeech_A_labeled_Speech_Corpus_ASR__valid path: data/PunjabiSpeech_A_labeled_Speech_Corpus_ASR__valid-* - split: PunjabiSpeech_A_labeled_Speech_Corpus_ASR__test path: data/PunjabiSpeech_A_labeled_Speech_Corpus_ASR__test-* - split: shrutilipi_pa_ASR_filtered__train path: data/shrutilipi_pa_ASR_filtered__train-* - split: shrutilipi_pa_ASR_filtered__test path: data/shrutilipi_pa_ASR_filtered__test-* ---
提供机构:
kdcyberdude
原始信息汇总

数据集概述

数据集特征

  • speaker_id: 数据类型为字符串。
  • audio: 数据类型为音频,采样率为16000。
  • text: 数据类型为字符串。
  • gender: 数据类型为字符串。
  • duration: 数据类型为浮点数。

数据集分割

  • CMU_Synth_ASR__train: 18179个样本,大小为13675334135字节。
  • CMU_Synth_ASR__test: 7971个样本,大小为3184547291字节。
  • CMU_Synth_ASR__valid: 7971个样本,大小为2693990107字节。
  • Common_Voice_16_1_pa_IN_ASR__train: 166个样本,大小为371182890字节。
  • Common_Voice_16_1_pa_IN_ASR__test: 487个样本,大小为17077660字节。
  • Common_Voice_16_1_pa_IN_ASR__validation: 286个样本,大小为9769806字节。
  • fleurs_pa_ASR__train: 1923个样本,大小为733609139.625字节。
  • fleurs_pa_ASR__valid: 251个样本,大小为86172346字节。
  • fleurs_pa_ASR__test: 574个样本,大小为211794695字节。
  • Google_Synth_ASR__train: 4589个样本,大小为4737066155.372字节。
  • Google_Synth_ASR__test: 5000个样本,大小为906231911字节。
  • Google_Synth_ASR__valid: 4999个样本,大小为850094029字节。
  • indicsuperb_pa_ASR_filtered__train: 22268个样本,大小为15067126022.5字节。
  • indicsuperb_pa_ASR_filtered__test: 7468个样本,大小为1504487683.5字节。
  • Indicvoice_pa_ASR__train: 17300个样本,大小为8822867262.5字节。
  • Indicvoice_pa_ASR__valid: 95个样本,大小为50023391字节。
  • PunjabiSpeech_A_labeled_Speech_Corpus_ASR__train: 468个样本,大小为1099202639字节。
  • PunjabiSpeech_A_labeled_Speech_Corpus_ASR__valid: 243个样本,大小为124658452字节。
  • PunjabiSpeech_A_labeled_Speech_Corpus_ASR__test: 243个样本,大小为124551489字节。
  • shrutilipi_pa_ASR_filtered__train: 27991个样本,大小为7852687829.125字节。
  • shrutilipi_pa_ASR_filtered__test: 1159个样本,大小为321221358.125字节。

数据集大小

  • 下载大小: 59878729794字节。
  • 数据集大小: 62443696291.747字节。

配置文件

  • config_name: default
    • 包含多个分割的数据文件路径,如CMU_Synth_ASR__trainCMU_Synth_ASR__test等,路径格式为data/分割名-*
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作