five

JST-SUPERB/MUSAN-speech_unit_part1

收藏
Hugging Face2024-07-10 更新2024-07-22 收录
下载链接:
https://hf-mirror.com/datasets/JST-SUPERB/MUSAN-speech_unit_part1
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含多个音频分割,每个分割对应不同的音频质量和采样率。数据集的特征包括语音输入、不同信噪比下的噪声音频转录、干净音频的转录以及音频单元序列等。数据集的下载大小为9300844417字节,总大小为16579839734.800001字节。

This dataset contains multiple audio splits, each corresponding to different audio qualities and sampling rates. The features of the dataset include speech input, transcriptions of noisy audio at different signal-to-noise ratios, transcriptions of clean audio, and audio unit sequences. The download size of the dataset is 9300844417 bytes, and the total size is 16579839734.800001 bytes.
提供机构:
JST-SUPERB
原始信息汇总

数据集概述

数据集名称

MUSAN-speech_unit_part1

数据集配置

  • config_name: default
    • data_files:
      • split: academicodec_hifi_16k_320d
        • path: data/academicodec_hifi_16k_320d-*
      • split: academicodec_hifi_16k_320d_large_uni
        • path: data/academicodec_hifi_16k_320d_large_uni-*
      • split: academicodec_hifi_24k_320d
        • path: data/academicodec_hifi_24k_320d-*
      • split: audiodec_24k_320d
        • path: data/audiodec_24k_320d-*
      • split: dac_16k
        • path: data/dac_16k-*
      • split: dac_24k
        • path: data/dac_24k-*
      • split: dac_44k
        • path: data/dac_44k-*
      • split: speech_tokenizer_16k
        • path: data/speech_tokenizer_16k-*

数据集特征

  • name: speech_input
    • dtype: string
  • name: noisy_-20dB
    • dtype: audio
  • name: noisy_10dB_transcription_whisper-small.en
    • dtype: string
  • name: noisy_5dB_transcription_whisper-small.en
    • dtype: string
  • name: noisy_0dB_transcription_whisper-small.en
    • dtype: string
  • name: noisy_-5dB_transcription_whisper-small.en
    • dtype: string
  • name: noisy_-10dB_transcription_whisper-small.en
    • dtype: string
  • name: noisy_10dB_transcription_whisper-medium.en
    • dtype: string
  • name: noisy_5dB_transcription_whisper-medium.en
    • dtype: string
  • name: noisy_0dB_transcription_whisper-medium.en
    • dtype: string
  • name: noisy_-5dB_transcription_whisper-medium.en
    • dtype: string
  • name: noisy_-10dB_transcription_whisper-medium.en
    • dtype: string
  • name: noisy_10dB_transcription_whisper-large-v3
    • dtype: string
  • name: noisy_5dB_transcription_whisper-large-v3
    • dtype: string
  • name: noisy_0dB_transcription_whisper-large-v3
    • dtype: string
  • name: noisy_-5dB_transcription_whisper-large-v3
    • dtype: string
  • name: noisy_-10dB_transcription_whisper-large-v3
    • dtype: string
  • name: output
    • dtype: string
  • name: clean_audio_transcription_whisper-small.en
    • dtype: string
  • name: clean_audio_transcription_whisper-medium.en
    • dtype: string
  • name: clean_audio_transcription_whisper-large-v3
    • dtype: string
  • name: clean_audio_unit
    • sequence:
      • sequence: int64
  • name: noisy_10dB_unit
    • sequence:
      • sequence: int64
  • name: noisy_5dB_unit
    • sequence:
      • sequence: int64
  • name: noisy_0dB_unit
    • sequence:
      • sequence: int64
  • name: noisy_-5dB_unit
    • sequence:
      • sequence: int64
  • name: noisy_-10dB_unit
    • sequence:
      • sequence: int64

数据集分割

  • name: academicodec_hifi_16k_320d
    • num_bytes: 1360089979.85
    • num_examples: 5135
  • name: academicodec_hifi_16k_320d_large_uni
    • num_bytes: 1360089979.85
    • num_examples: 5135
  • name: academicodec_hifi_24k_320d
    • num_bytes: 1480598203.85
    • num_examples: 5135
  • name: audiodec_24k_320d
    • num_bytes: 1892270875.85
    • num_examples: 5135
  • name: dac_16k
    • num_bytes: 1998532027.85
    • num_examples: 5135
  • name: dac_24k
    • num_bytes: 4630613467.85
    • num_examples: 5135
  • name: dac_44k
    • num_bytes: 2254588147.85
    • num_examples: 5135
  • name: speech_tokenizer_16k
    • num_bytes: 1603057051.85
    • num_examples: 5135

数据集大小

  • download_size: 9300844417
  • dataset_size: 16579839734.800001
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作