five

jq/salt-asr-data-transcriptions

收藏
Hugging Face2024-05-03 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/jq/salt-asr-data-transcriptions
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: multispeaker-ach features: - name: id dtype: int64 - name: text dtype: string - name: audio_language dtype: string - name: is_studio dtype: bool - name: speaker_id dtype: string - name: sample_rate dtype: int64 - name: transcription dtype: string - name: edit_distance dtype: int64 splits: - name: train num_bytes: 704574 num_examples: 4811 - name: dev num_bytes: 14750 num_examples: 101 - name: test num_bytes: 14497 num_examples: 96 download_size: 401928 dataset_size: 733821 - config_name: multispeaker-eng features: - name: id dtype: int64 - name: text dtype: string - name: audio_language dtype: string - name: is_studio dtype: bool - name: speaker_id dtype: string - name: sample_rate dtype: int64 - name: transcription dtype: string - name: edit_distance dtype: int64 splits: - name: dev num_bytes: 15282 num_examples: 100 - name: test num_bytes: 15194 num_examples: 96 - name: train num_bytes: 734854 num_examples: 4797 download_size: 402022 dataset_size: 765330 - config_name: multispeaker-lgg features: - name: id dtype: int64 - name: text dtype: string - name: audio_language dtype: string - name: is_studio dtype: bool - name: speaker_id dtype: string - name: sample_rate dtype: int64 - name: transcription dtype: string - name: edit_distance dtype: int64 splits: - name: train num_bytes: 704330 num_examples: 4811 - name: dev num_bytes: 14684 num_examples: 101 - name: test num_bytes: 14411 num_examples: 96 download_size: 406173 dataset_size: 733425 - config_name: multispeaker-lug features: - name: id dtype: int64 - name: text dtype: string - name: audio_language dtype: string - name: is_studio dtype: bool - name: speaker_id dtype: string - name: sample_rate dtype: int64 - name: transcription dtype: string - name: edit_distance dtype: int64 splits: - name: train num_bytes: 801734 num_examples: 5016 - name: dev num_bytes: 16421 num_examples: 103 - name: test num_bytes: 16270 num_examples: 97 download_size: 819770 dataset_size: 834425 - config_name: multispeaker-nyn features: - name: id dtype: int64 - name: text dtype: string - name: audio_language dtype: string - name: is_studio dtype: bool - name: speaker_id dtype: string - name: sample_rate dtype: int64 - name: transcription dtype: string - name: edit_distance dtype: int64 splits: - name: train num_bytes: 700078 num_examples: 4811 - name: dev num_bytes: 14574 num_examples: 101 - name: test num_bytes: 14351 num_examples: 96 download_size: 417568 dataset_size: 729003 - config_name: multispeaker-teo features: - name: id dtype: int64 - name: text dtype: string - name: audio_language dtype: string - name: is_studio dtype: bool - name: speaker_id dtype: string - name: sample_rate dtype: int64 - name: transcription dtype: string - name: edit_distance dtype: int64 splits: - name: train num_bytes: 690253 num_examples: 4811 - name: dev num_bytes: 14389 num_examples: 101 - name: test num_bytes: 14113 num_examples: 96 download_size: 401353 dataset_size: 718755 configs: - config_name: multispeaker-ach data_files: - split: train path: multispeaker-ach/train-* - split: dev path: multispeaker-ach/dev-* - split: test path: multispeaker-ach/test-* - config_name: multispeaker-eng data_files: - split: dev path: multispeaker-eng/dev-* - split: test path: multispeaker-eng/test-* - split: train path: multispeaker-eng/train-* - config_name: multispeaker-lgg data_files: - split: train path: multispeaker-lgg/train-* - split: dev path: multispeaker-lgg/dev-* - split: test path: multispeaker-lgg/test-* - config_name: multispeaker-lug data_files: - split: train path: multispeaker-lug/train-* - split: dev path: multispeaker-lug/dev-* - split: test path: multispeaker-lug/test-* - config_name: multispeaker-nyn data_files: - split: train path: multispeaker-nyn/train-* - split: dev path: multispeaker-nyn/dev-* - split: test path: multispeaker-nyn/test-* - config_name: multispeaker-teo data_files: - split: train path: multispeaker-teo/train-* - split: dev path: multispeaker-teo/dev-* - split: test path: multispeaker-teo/test-* ---
提供机构:
jq
原始信息汇总

数据集概述

数据集配置及特征

  1. multispeaker-ach

    • 特征:
      • id: int64
      • text: string
      • audio_language: string
      • is_studio: bool
      • speaker_id: string
      • sample_rate: int64
      • transcription: string
      • edit_distance: int64
    • 分割:
      • train: 4811 examples, 704574 bytes
      • dev: 101 examples, 14750 bytes
      • test: 96 examples, 14497 bytes
    • 下载大小: 401928
    • 数据集大小: 733821
  2. multispeaker-eng

    • 特征: 同上
    • 分割:
      • train: 4797 examples, 734854 bytes
      • dev: 100 examples, 15282 bytes
      • test: 96 examples, 15194 bytes
    • 下载大小: 402022
    • 数据集大小: 765330
  3. multispeaker-lgg

    • 特征: 同上
    • 分割:
      • train: 4811 examples, 704330 bytes
      • dev: 101 examples, 14684 bytes
      • test: 96 examples, 14411 bytes
    • 下载大小: 406173
    • 数据集大小: 733425
  4. multispeaker-lug

    • 特征: 同上
    • 分割:
      • train: 5016 examples, 801734 bytes
      • dev: 103 examples, 16421 bytes
      • test: 97 examples, 16270 bytes
    • 下载大小: 819770
    • 数据集大小: 834425
  5. multispeaker-nyn

    • 特征: 同上
    • 分割:
      • train: 4811 examples, 700078 bytes
      • dev: 101 examples, 14574 bytes
      • test: 96 examples, 14351 bytes
    • 下载大小: 417568
    • 数据集大小: 729003
  6. multispeaker-teo

    • 特征: 同上
    • 分割:
      • train: 4811 examples, 690253 bytes
      • dev: 101 examples, 14389 bytes
      • test: 96 examples, 14113 bytes
    • 下载大小: 401353
    • 数据集大小: 718755

数据集文件路径

  • multispeaker-ach

    • train: multispeaker-ach/train-*
    • dev: multispeaker-ach/dev-*
    • test: multispeaker-ach/test-*
  • multispeaker-eng

    • train: multispeaker-eng/train-*
    • dev: multispeaker-eng/dev-*
    • test: multispeaker-eng/test-*
  • multispeaker-lgg

    • train: multispeaker-lgg/train-*
    • dev: multispeaker-lgg/dev-*
    • test: multispeaker-lgg/test-*
  • multispeaker-lug

    • train: multispeaker-lug/train-*
    • dev: multispeaker-lug/dev-*
    • test: multispeaker-lug/test-*
  • multispeaker-nyn

    • train: multispeaker-nyn/train-*
    • dev: multispeaker-nyn/dev-*
    • test: multispeaker-nyn/test-*
  • multispeaker-teo

    • train: multispeaker-teo/train-*
    • dev: multispeaker-teo/dev-*
    • test: multispeaker-teo/test-*
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作