mshah1/speech_robust_bench
收藏Hugging Face2024-06-03 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/mshah1/speech_robust_bench
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: accented_cv
features:
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: age
dtype: string
- name: gender
dtype: string
- name: accents
dtype: string
- name: locale
dtype: string
- name: id
dtype: int64
splits:
- name: test
num_bytes: 55407854.085
num_examples: 1355
- name: test.clean
num_bytes: 25593824.0
num_examples: 640
download_size: 78598662
dataset_size: 81001678.08500001
- config_name: chime
features:
- name: audio
dtype: audio
- name: end_time
dtype: string
- name: start_time
dtype: string
- name: speaker
dtype: string
- name: ref
dtype: string
- name: location
dtype: string
- name: session_id
dtype: string
- name: text
dtype: string
splits:
- name: farfield
num_bytes: 521160936.31
num_examples: 6535
- name: nearfield
num_bytes: 1072274621.0799999
num_examples: 6535
download_size: 1532887016
dataset_size: 1593435557.3899999
- config_name: in-the-wild
features:
- name: audio
dtype: audio
- name: end_time
dtype: string
- name: start_time
dtype: string
- name: speaker
dtype: string
- name: ref
dtype: string
- name: location
dtype: string
- name: session_id
dtype: string
- name: id
dtype: string
- name: text
dtype: string
splits:
- name: farfield
num_bytes: 521363521.31
num_examples: 6535
- name: nearfield
num_bytes: 1072477206.0799999
num_examples: 6535
download_size: 1533124839
dataset_size: 1593840727.3899999
- config_name: in-the-wild-AMI
features:
- name: meeting_id
dtype: string
- name: id
dtype: string
- name: text
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: begin_time
dtype: float32
- name: end_time
dtype: float32
- name: microphone_id
dtype: string
- name: speaker_id
dtype: string
splits:
- name: nearfield
num_bytes: 1382749390.9785259
num_examples: 6584
- name: farfield
num_bytes: 1040706691.1008185
num_examples: 6584
download_size: 2164898498
dataset_size: 2423456082.0793443
- config_name: in-the-wild-ami
features:
- name: meeting_id
dtype: string
- name: audio_id
dtype: string
- name: text
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: begin_time
dtype: float32
- name: end_time
dtype: float32
- name: microphone_id
dtype: string
- name: speaker_id
dtype: string
splits:
- name: nearfield
num_bytes: 1382749390.9785259
num_examples: 6584
- name: farfield
num_bytes: 1040706691.1008185
num_examples: 6584
download_size: 2164900274
dataset_size: 2423456082.0793443
- config_name: librispeech_asr-test.clean
features:
- name: file
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: speaker_id
dtype: int64
- name: chapter_id
dtype: int64
- name: id
dtype: string
splits:
- name: speedup.1
num_bytes: 498896619.34
num_examples: 2620
- name: speedup.2
num_bytes: 415901075.34
num_examples: 2620
- name: speedup.3
num_bytes: 356617835.34
num_examples: 2620
- name: speedup.4
num_bytes: 312152811.34
num_examples: 2620
- name: slowdown.1
num_bytes: 712320343.34
num_examples: 2620
- name: slowdown.2
num_bytes: 830887339.34
num_examples: 2620
- name: slowdown.3
num_bytes: 996880127.34
num_examples: 2620
- name: slowdown.4
num_bytes: 1245871847.34
num_examples: 2620
- name: pitch_up.3
num_bytes: 623392467.34
num_examples: 2620
- name: pitch_up.4
num_bytes: 623392467.34
num_examples: 2620
- name: pitch_down.1
num_bytes: 623392467.34
num_examples: 2620
- name: pitch_down.2
num_bytes: 623392467.34
num_examples: 2620
- name: pitch_down.3
num_bytes: 623392467.34
num_examples: 2620
- name: pitch_down.4
num_bytes: 623392467.34
num_examples: 2620
- name: pitch_up.1
num_bytes: 623392458.5
num_examples: 2620
- name: pitch_up.2
num_bytes: 623392458.5
num_examples: 2620
- name: resample.1
num_bytes: 623392535.34
num_examples: 2620
- name: resample.2
num_bytes: 623392535.34
num_examples: 2620
- name: resample.3
num_bytes: 623392579.34
num_examples: 2620
- name: resample.4
num_bytes: 623392623.34
num_examples: 2620
- name: voice_conversion.4
num_bytes: 799852214.5
num_examples: 2620
- name: voice_conversion.3
num_bytes: 580185782.5
num_examples: 2620
- name: voice_conversion.1
num_bytes: 589259446.5
num_examples: 2620
- name: voice_conversion.2
num_bytes: 571175606.5
num_examples: 2620
- name: gain.1
num_bytes: 623392467.34
num_examples: 2620
- name: gain.2
num_bytes: 623392467.34
num_examples: 2620
- name: gain.3
num_bytes: 623392467.34
num_examples: 2620
- name: echo.1
num_bytes: 633872467.34
num_examples: 2620
- name: echo.2
num_bytes: 644352467.34
num_examples: 2620
- name: echo.3
num_bytes: 665312467.34
num_examples: 2620
- name: echo.4
num_bytes: 707232467.34
num_examples: 2620
- name: phaser.1
num_bytes: 623392467.34
num_examples: 2620
- name: phaser.2
num_bytes: 623392467.34
num_examples: 2620
- name: phaser.3
num_bytes: 623392467.34
num_examples: 2620
- name: tempo_up.1
num_bytes: 498896595.34
num_examples: 2620
- name: tempo_up.2
num_bytes: 415899351.34
num_examples: 2620
- name: tempo_up.3
num_bytes: 356615595.34
num_examples: 2620
- name: tempo_up.4
num_bytes: 312152811.34
num_examples: 2620
- name: tempo_down.1
num_bytes: 712318083.34
num_examples: 2620
- name: tempo_down.2
num_bytes: 830885583.34
num_examples: 2620
- name: tempo_down.3
num_bytes: 996880103.34
num_examples: 2620
- name: tempo_down.4
num_bytes: 1245871847.34
num_examples: 2620
- name: gain.4
num_bytes: 623392467.34
num_examples: 2620
- name: phaser.4
num_bytes: 623392467.34
num_examples: 2620
- name: lowpass.1
num_bytes: 623392467.34
num_examples: 2620
- name: lowpass.2
num_bytes: 623392467.34
num_examples: 2620
- name: lowpass.3
num_bytes: 623392467.34
num_examples: 2620
- name: lowpass.4
num_bytes: 623392467.34
num_examples: 2620
- name: highpass.1
num_bytes: 623392467.34
num_examples: 2620
- name: highpass.2
num_bytes: 623392467.34
num_examples: 2620
- name: highpass.3
num_bytes: 623392467.34
num_examples: 2620
- name: highpass.4
num_bytes: 623392467.34
num_examples: 2620
- name: voice_conversion_vctk.1
num_bytes: 495165825.88
num_examples: 2620
- name: universal_adv.1
num_bytes: 623392467.34
num_examples: 2620
- name: rir.1
num_bytes: 705636818.5
num_examples: 2620
- name: rir.2
num_bytes: 744484818.5
num_examples: 2620
- name: rir.3
num_bytes: 758740818.5
num_examples: 2620
- name: rir.4
num_bytes: 776116818.5
num_examples: 2620
- name: gnoise.1
num_bytes: 623392455.88
num_examples: 2620
- name: gnoise.2
num_bytes: 623392455.88
num_examples: 2620
- name: gnoise.3
num_bytes: 623392455.88
num_examples: 2620
- name: gnoise.4
num_bytes: 623392455.88
num_examples: 2620
- name: env_noise_esc50.1
num_bytes: 623392455.88
num_examples: 2620
- name: env_noise_esc50.2
num_bytes: 623392455.88
num_examples: 2620
- name: env_noise_esc50.3
num_bytes: 623392455.88
num_examples: 2620
- name: env_noise_esc50.4
num_bytes: 623392455.88
num_examples: 2620
- name: music.1
num_bytes: 623392455.88
num_examples: 2620
- name: music.2
num_bytes: 623392455.88
num_examples: 2620
- name: music.3
num_bytes: 623392455.88
num_examples: 2620
- name: music.4
num_bytes: 623392455.88
num_examples: 2620
- name: crosstalk.1
num_bytes: 623392455.88
num_examples: 2620
- name: crosstalk.2
num_bytes: 623392455.88
num_examples: 2620
- name: crosstalk.3
num_bytes: 623392455.88
num_examples: 2620
- name: crosstalk.4
num_bytes: 623392455.88
num_examples: 2620
- name: env_noise_musan.1
num_bytes: 623392455.88
num_examples: 2620
- name: env_noise_musan.2
num_bytes: 623392455.88
num_examples: 2620
- name: env_noise_musan.3
num_bytes: 623392455.88
num_examples: 2620
- name: env_noise_musan.4
num_bytes: 623392455.88
num_examples: 2620
- name: real_rir.1
num_bytes: 638169615.88
num_examples: 2620
- name: real_rir.2
num_bytes: 694281819.88
num_examples: 2620
- name: real_rir.3
num_bytes: 713200537.88
num_examples: 2620
- name: real_rir.4
num_bytes: 1515177725.88
num_examples: 2620
- name: env_noise.1
num_bytes: 623392455.88
num_examples: 2620
- name: env_noise.2
num_bytes: 623392455.88
num_examples: 2620
- name: env_noise.3
num_bytes: 623392455.88
num_examples: 2620
- name: env_noise.4
num_bytes: 623392455.88
num_examples: 2620
- name: env_noise_wham.1
num_bytes: 623392455.88
num_examples: 2620
- name: env_noise_wham.2
num_bytes: 623392455.88
num_examples: 2620
- name: env_noise_wham.3
num_bytes: 623392455.88
num_examples: 2620
- name: env_noise_wham.4
num_bytes: 623392455.88
num_examples: 2620
- name: tremolo.1
num_bytes: 623392455.88
num_examples: 2620
- name: tremolo.2
num_bytes: 623392455.88
num_examples: 2620
- name: tremolo.3
num_bytes: 623392455.88
num_examples: 2620
- name: tremolo.4
num_bytes: 623392455.88
num_examples: 2620
- name: treble.1
num_bytes: 623392455.88
num_examples: 2620
- name: treble.2
num_bytes: 623392455.88
num_examples: 2620
- name: treble.3
num_bytes: 623392455.88
num_examples: 2620
- name: treble.4
num_bytes: 623392455.88
num_examples: 2620
- name: bass.1
num_bytes: 623392455.88
num_examples: 2620
- name: bass.2
num_bytes: 623392455.88
num_examples: 2620
- name: bass.3
num_bytes: 623392455.88
num_examples: 2620
- name: bass.4
num_bytes: 623392455.88
num_examples: 2620
- name: chorus.1
num_bytes: 626913735.88
num_examples: 2620
- name: chorus.2
num_bytes: 628590535.88
num_examples: 2620
- name: chorus.3
num_bytes: 630267335.88
num_examples: 2620
- name: chorus.4
num_bytes: 631944135.88
num_examples: 2620
- name: None.0
num_bytes: 367982506.42
num_examples: 2620
download_size: 67547733720
dataset_size: 68871044112.51988
- config_name: librispeech_asr-test.clean_pertEval_500_30
features:
- name: file
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: speaker_id
dtype: int64
- name: chapter_id
dtype: int64
- name: id
dtype: string
- name: pert_idx
dtype: int64
splits:
- name: gnoise.1
num_bytes: 3592401090.0
num_examples: 15000
- name: env_noise_esc50.1
num_bytes: 3592401090.0
num_examples: 15000
download_size: 7170899040
dataset_size: 7184802180.0
- config_name: multilingual_librispeech-spanish_test
features:
- name: file
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: speaker_id
dtype: int64
- name: chapter_id
dtype: int64
- name: id
dtype: string
splits:
- name: None.0
num_bytes: 596762288.01
num_examples: 2385
- name: env_noise.1
num_bytes: 1153485830.17
num_examples: 2385
- name: env_noise.2
num_bytes: 1153485830.17
num_examples: 2385
- name: env_noise.3
num_bytes: 1153485830.17
num_examples: 2385
- name: env_noise.4
num_bytes: 1153485830.17
num_examples: 2385
- name: rir.1
num_bytes: 1268493860.17
num_examples: 2385
- name: rir.2
num_bytes: 1252109860.17
num_examples: 2385
- name: rir.3
num_bytes: 1249517860.17
num_examples: 2385
- name: rir.4
num_bytes: 1222893860.17
num_examples: 2385
- name: speedup.1
num_bytes: 923001764.17
num_examples: 2385
- name: speedup.2
num_bytes: 769347364.17
num_examples: 2385
- name: speedup.3
num_bytes: 659593516.17
num_examples: 2385
- name: speedup.4
num_bytes: 577275652.17
num_examples: 2385
- name: slowdown.1
num_bytes: 1318119422.17
num_examples: 2385
- name: slowdown.2
num_bytes: 1537627530.17
num_examples: 2385
- name: slowdown.3
num_bytes: 1844938056.17
num_examples: 2385
- name: slowdown.4
num_bytes: 2305906194.17
num_examples: 2385
- name: pitch_up.3
num_bytes: 1153485830.17
num_examples: 2385
- name: pitch_up.4
num_bytes: 1153485830.17
num_examples: 2385
- name: pitch_down.1
num_bytes: 1153485830.17
num_examples: 2385
- name: pitch_down.2
num_bytes: 1153485830.17
num_examples: 2385
- name: pitch_down.3
num_bytes: 1153485830.17
num_examples: 2385
- name: pitch_down.4
num_bytes: 1153485830.17
num_examples: 2385
- name: pitch_up.1
num_bytes: 1153485821.72
num_examples: 2385
- name: pitch_up.2
num_bytes: 1153485821.72
num_examples: 2385
- name: resample.2
num_bytes: 1153485842.17
num_examples: 2385
- name: gain.1
num_bytes: 1153485830.17
num_examples: 2385
- name: gain.2
num_bytes: 1153485830.17
num_examples: 2385
- name: gain.3
num_bytes: 1153485830.17
num_examples: 2385
- name: gain.4
num_bytes: 1153485830.17
num_examples: 2385
- name: echo.1
num_bytes: 1163025830.17
num_examples: 2385
- name: echo.2
num_bytes: 1172565830.17
num_examples: 2385
- name: echo.3
num_bytes: 1191645830.17
num_examples: 2385
- name: echo.4
num_bytes: 1229805830.17
num_examples: 2385
- name: tempo_up.1
num_bytes: 923001758.17
num_examples: 2385
- name: tempo_up.2
num_bytes: 769345632.17
num_examples: 2385
- name: tempo_up.3
num_bytes: 659591372.17
num_examples: 2385
- name: tempo_up.4
num_bytes: 577275652.17
num_examples: 2385
- name: tempo_down.1
num_bytes: 1318117252.17
num_examples: 2385
- name: tempo_down.2
num_bytes: 1537626028.17
num_examples: 2385
- name: tempo_down.3
num_bytes: 1844938048.17
num_examples: 2385
- name: tempo_down.4
num_bytes: 2305906194.17
num_examples: 2385
- name: phaser.1
num_bytes: 1153485830.17
num_examples: 2385
- name: phaser.2
num_bytes: 1153485830.17
num_examples: 2385
- name: phaser.3
num_bytes: 1153485830.17
num_examples: 2385
- name: phaser.4
num_bytes: 1153485830.17
num_examples: 2385
- name: resample.1
num_bytes: 1153485840.17
num_examples: 2385
- name: resample.3
num_bytes: 1153485850.17
num_examples: 2385
- name: resample.4
num_bytes: 1153485882.17
num_examples: 2385
- name: lowpass.1
num_bytes: 1153485830.17
num_examples: 2385
- name: lowpass.2
num_bytes: 1153485830.17
num_examples: 2385
- name: lowpass.3
num_bytes: 1153485830.17
num_examples: 2385
- name: lowpass.4
num_bytes: 1153485830.17
num_examples: 2385
- name: highpass.1
num_bytes: 1153485830.17
num_examples: 2385
- name: highpass.2
num_bytes: 1153485830.17
num_examples: 2385
- name: highpass.3
num_bytes: 1153485830.17
num_examples: 2385
- name: highpass.4
num_bytes: 1153485830.17
num_examples: 2385
- name: gnoise.1
num_bytes: 1153485822.49
num_examples: 2385
- name: gnoise.2
num_bytes: 1153485822.49
num_examples: 2385
- name: gnoise.3
num_bytes: 1153485822.49
num_examples: 2385
- name: gnoise.4
num_bytes: 1153485822.49
num_examples: 2385
- name: env_noise_esc50.1
num_bytes: 1153485822.49
num_examples: 2385
- name: env_noise_esc50.2
num_bytes: 1153485822.49
num_examples: 2385
- name: env_noise_esc50.3
num_bytes: 1153485822.49
num_examples: 2385
- name: env_noise_esc50.4
num_bytes: 1153485822.49
num_examples: 2385
- name: env_noise_musan.1
num_bytes: 1153485822.49
num_examples: 2385
- name: env_noise_musan.2
num_bytes: 1153485822.49
num_examples: 2385
- name: env_noise_musan.3
num_bytes: 1153485822.49
num_examples: 2385
- name: env_noise_musan.4
num_bytes: 1153485822.49
num_examples: 2385
- name: music.1
num_bytes: 1153485822.49
num_examples: 2385
- name: music.2
num_bytes: 1153485822.49
num_examples: 2385
- name: music.3
num_bytes: 1153485822.49
num_examples: 2385
- name: music.4
num_bytes: 1153485822.49
num_examples: 2385
- name: crosstalk.1
num_bytes: 1153485822.49
num_examples: 2385
- name: crosstalk.2
num_bytes: 1153485822.49
num_examples: 2385
- name: crosstalk.3
num_bytes: 1153485822.49
num_examples: 2385
- name: crosstalk.4
num_bytes: 1153485822.49
num_examples: 2385
- name: env_noise_wham.1
num_bytes: 1153485822.49
num_examples: 2385
- name: env_noise_wham.2
num_bytes: 1153485822.49
num_examples: 2385
- name: env_noise_wham.3
num_bytes: 1153485822.49
num_examples: 2385
- name: env_noise_wham.4
num_bytes: 1153485822.49
num_examples: 2385
- name: tremolo.1
num_bytes: 1153485822.49
num_examples: 2385
- name: tremolo.2
num_bytes: 1153485822.49
num_examples: 2385
- name: tremolo.4
num_bytes: 1153485822.49
num_examples: 2385
- name: treble.1
num_bytes: 1153485822.49
num_examples: 2385
- name: treble.2
num_bytes: 1153485822.49
num_examples: 2385
- name: treble.3
num_bytes: 1153485822.49
num_examples: 2385
- name: treble.4
num_bytes: 1153485822.49
num_examples: 2385
- name: bass.1
num_bytes: 1153485822.49
num_examples: 2385
- name: bass.2
num_bytes: 1153485822.49
num_examples: 2385
- name: bass.3
num_bytes: 1153485822.49
num_examples: 2385
- name: bass.4
num_bytes: 1153485822.49
num_examples: 2385
- name: chorus.1
num_bytes: 1156691262.49
num_examples: 2385
- name: chorus.2
num_bytes: 1158217662.49
num_examples: 2385
- name: chorus.3
num_bytes: 1159744062.49
num_examples: 2385
- name: chorus.4
num_bytes: 1161270462.49
num_examples: 2385
- name: tremolo.3
num_bytes: 1153485822.49
num_examples: 2385
download_size: 117646635522
dataset_size: 113291392188.23016
- config_name: multilingual_librispeech-spanish_test_pertEval_500_30
features:
- name: file
dtype: string
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: speaker_id
dtype: int64
- name: chapter_id
dtype: int64
- name: id
dtype: string
- name: pert_idx
dtype: int64
splits:
- name: gnoise.1
num_bytes: 7341021960.0
num_examples: 15000
- name: env_noise_esc50.1
num_bytes: 7341021960.0
num_examples: 15000
download_size: 14645523867
dataset_size: 14682043920.0
- config_name: tedlium-release3_test
features:
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: text
dtype: string
- name: speaker_id
dtype: string
- name: gender
dtype:
class_label:
names:
'0': unknown
'1': female
'2': male
- name: file
dtype: string
- name: id
dtype: string
splits:
- name: None.0
num_bytes: 277376247.9680054
num_examples: 1155
- name: speedup.1
num_bytes: 221990159.49965963
num_examples: 1155
- name: speedup.2
num_bytes: 185066240.47311097
num_examples: 1155
- name: speedup.3
num_bytes: 158691929.4792376
num_examples: 1155
- name: slowdown.1
num_bytes: 316938966.95371
num_examples: 1155
- name: slowdown.2
num_bytes: 369687787.0762423
num_examples: 1155
- name: slowdown.3
num_bytes: 443535996.23893803
num_examples: 1155
- name: pitch_up.1
num_bytes: 277376247.9680054
num_examples: 1155
- name: pitch_up.2
num_bytes: 277376247.9680054
num_examples: 1155
- name: pitch_up.3
num_bytes: 277376247.9680054
num_examples: 1155
- name: pitch_down.1
num_bytes: 277376247.9680054
num_examples: 1155
- name: pitch_down.2
num_bytes: 277376247.9680054
num_examples: 1155
- name: pitch_down.3
num_bytes: 277376247.9680054
num_examples: 1155
- name: rir.1
num_bytes: 313788218.1586113
num_examples: 1155
- name: rir.2
num_bytes: 330268000.32334924
num_examples: 1155
- name: rir.3
num_bytes: 336608313.46153843
num_examples: 1155
- name: voice_conversion_vctk.1
num_bytes: 216990920.87134105
num_examples: 1155
- name: resample.1
num_bytes: 277376301.4329476
num_examples: 1155
- name: resample.2
num_bytes: 277376301.4329476
num_examples: 1155
- name: resample.3
num_bytes: 277376354.89788973
num_examples: 1155
- name: gain.1
num_bytes: 277376247.9680054
num_examples: 1155
- name: gain.2
num_bytes: 277376247.9680054
num_examples: 1155
- name: gain.3
num_bytes: 277376247.9680054
num_examples: 1155
- name: echo.1
num_bytes: 281996247.9680054
num_examples: 1155
- name: echo.2
num_bytes: 286616247.9680054
num_examples: 1155
- name: echo.3
num_bytes: 295856247.9680054
num_examples: 1155
- name: phaser.1
num_bytes: 277376247.9680054
num_examples: 1155
- name: phaser.2
num_bytes: 277376247.9680054
num_examples: 1155
- name: phaser.3
num_bytes: 277376247.9680054
num_examples: 1155
- name: tempo_up.1
num_bytes: 221989786.81756297
num_examples: 1155
- name: tempo_up.2
num_bytes: 185065496.68141592
num_examples: 1155
- name: tempo_up.3
num_bytes: 158690987.55275697
num_examples: 1155
- name: tempo_down.1
num_bytes: 316938020.3097345
num_examples: 1155
- name: tempo_down.2
num_bytes: 369686999.254595
num_examples: 1155
- name: tempo_down.3
num_bytes: 443535631.41933286
num_examples: 1155
- name: lowpass.1
num_bytes: 277376247.9680054
num_examples: 1155
- name: lowpass.2
num_bytes: 277376247.9680054
num_examples: 1155
- name: lowpass.3
num_bytes: 277376247.9680054
num_examples: 1155
- name: highpass.1
num_bytes: 277376247.9680054
num_examples: 1155
- name: highpass.2
num_bytes: 277376247.9680054
num_examples: 1155
- name: highpass.3
num_bytes: 277376247.9680054
num_examples: 1155
- name: speedup.4
num_bytes: 138910125.75561607
num_examples: 1155
- name: slowdown.4
num_bytes: 554308545.8577263
num_examples: 1155
- name: pitch_up.4
num_bytes: 277376247.9680054
num_examples: 1155
- name: pitch_down.4
num_bytes: 277376247.9680054
num_examples: 1155
- name: rir.4
num_bytes: 345514943.8223281
num_examples: 1155
- name: resample.4
num_bytes: 277376474.4077604
num_examples: 1155
- name: gain.4
num_bytes: 277376247.9680054
num_examples: 1155
- name: echo.4
num_bytes: 314336247.9680054
num_examples: 1155
- name: phaser.4
num_bytes: 277376247.9680054
num_examples: 1155
- name: tempo_up.4
num_bytes: 138910125.75561607
num_examples: 1155
- name: tempo_down.4
num_bytes: 554308545.8577263
num_examples: 1155
- name: lowpass.4
num_bytes: 277376247.9680054
num_examples: 1155
- name: highpass.4
num_bytes: 277376247.9680054
num_examples: 1155
- name: gnoise.1
num_bytes: 277376247.9680054
num_examples: 1155
- name: gnoise.2
num_bytes: 277376247.9680054
num_examples: 1155
- name: gnoise.3
num_bytes: 277376247.9680054
num_examples: 1155
- name: music.1
num_bytes: 301958728.16
num_examples: 1155
- name: music.2
num_bytes: 301958728.16
num_examples: 1155
- name: music.3
num_bytes: 301958728.16
num_examples: 1155
- name: music.4
num_bytes: 301958728.16
num_examples: 1155
- name: crosstalk.1
num_bytes: 301958728.16
num_examples: 1155
- name: env_noise_esc50.1
num_bytes: 277376247.9680054
num_examples: 1155
- name: env_noise_esc50.2
num_bytes: 277376247.9680054
num_examples: 1155
- name: env_noise_esc50.3
num_bytes: 277376247.9680054
num_examples: 1155
- name: gnoise.4
num_bytes: 277376247.9680054
num_examples: 1155
- name: crosstalk.2
num_bytes: 301958728.16
num_examples: 1155
- name: env_noise_esc50.4
num_bytes: 277376247.9680054
num_examples: 1155
- name: crosstalk.3
num_bytes: 301958728.16
num_examples: 1155
- name: crosstalk.4
num_bytes: 301958728.16
num_examples: 1155
- name: env_noise_musan.1
num_bytes: 301958728.16
num_examples: 1155
- name: env_noise_musan.2
num_bytes: 301958728.16
num_examples: 1155
- name: env_noise_musan.3
num_bytes: 301958728.16
num_examples: 1155
- name: env_noise_musan.4
num_bytes: 301958728.16
num_examples: 1155
- name: real_rir.1
num_bytes: 308750878.16
num_examples: 1155
- name: real_rir.2
num_bytes: 333286988.16
num_examples: 1155
- name: real_rir.3
num_bytes: 341205738.16
num_examples: 1155
- name: real_rir.4
num_bytes: 715155314.16
num_examples: 1155
- name: env_noise.1
num_bytes: 301958728.16
num_examples: 1155
- name: env_noise.2
num_bytes: 301958728.16
num_examples: 1155
- name: env_noise.3
num_bytes: 301958728.16
num_examples: 1155
- name: env_noise.4
num_bytes: 301958728.16
num_examples: 1155
- name: env_noise_wham.1
num_bytes: 301958728.16
num_examples: 1155
- name: env_noise_wham.2
num_bytes: 301958728.16
num_examples: 1155
- name: env_noise_wham.3
num_bytes: 301958728.16
num_examples: 1155
- name: env_noise_wham.4
num_bytes: 301958728.16
num_examples: 1155
- name: tremolo.1
num_bytes: 301958728.16
num_examples: 1155
- name: tremolo.2
num_bytes: 301958728.16
num_examples: 1155
- name: tremolo.3
num_bytes: 301958728.16
num_examples: 1155
- name: tremolo.4
num_bytes: 301958728.16
num_examples: 1155
- name: treble.1
num_bytes: 301958728.16
num_examples: 1155
- name: treble.2
num_bytes: 301958728.16
num_examples: 1155
- name: treble.3
num_bytes: 301958728.16
num_examples: 1155
- name: treble.4
num_bytes: 301958728.16
num_examples: 1155
- name: bass.1
num_bytes: 301958728.16
num_examples: 1155
- name: bass.2
num_bytes: 301958728.16
num_examples: 1155
- name: bass.3
num_bytes: 301958728.16
num_examples: 1155
- name: bass.4
num_bytes: 301958728.16
num_examples: 1155
- name: chorus.1
num_bytes: 303511048.16
num_examples: 1155
- name: chorus.2
num_bytes: 304250248.16
num_examples: 1155
- name: chorus.4
num_bytes: 305728648.16
num_examples: 1155
- name: chorus.3
num_bytes: 304989448.16
num_examples: 1155
download_size: 58723208514
dataset_size: 30342709961.007984
configs:
- config_name: accented_cv
data_files:
- split: test
path: accented_cv/test-*
- split: test.clean
path: accented_cv/test.clean-*
- config_name: chime
data_files:
- split: farfield
path: chime/farfield-*
- split: nearfield
path: chime/nearfield-*
- config_name: in-the-wild
data_files:
- split: farfield
path: in-the-wild/farfield-*
- split: nearfield
path: in-the-wild/nearfield-*
- config_name: in-the-wild-AMI
data_files:
- split: nearfield
path: in-the-wild-AMI/nearfield-*
- split: farfield
path: in-the-wild-AMI/farfield-*
- config_name: in-the-wild-ami
data_files:
- split: nearfield
path: in-the-wild-ami/nearfield-*
- split: farfield
path: in-the-wild-ami/farfield-*
- config_name: librispeech_asr-test.clean
data_files:
- split: None.0
path: librispeech_asr-test.clean/None.0-*
- split: gnoise.1
path: librispeech_asr-test.clean/gnoise.1-*
- split: gnoise.2
path: librispeech_asr-test.clean/gnoise.2-*
- split: gnoise.3
path: librispeech_asr-test.clean/gnoise.3-*
- split: gnoise.4
path: librispeech_asr-test.clean/gnoise.4-*
- split: env_noise.1
path: librispeech_asr-test.clean/env_noise.1-*
- split: env_noise.2
path: librispeech_asr-test.clean/env_noise.2-*
- split: env_noise.3
path: librispeech_asr-test.clean/env_noise.3-*
- split: env_noise.4
path: librispeech_asr-test.clean/env_noise.4-*
- split: rir.1
path: librispeech_asr-test.clean/rir.1-*
- split: rir.2
path: librispeech_asr-test.clean/rir.2-*
- split: rir.3
path: librispeech_asr-test.clean/rir.3-*
- split: rir.4
path: librispeech_asr-test.clean/rir.4-*
- split: speedup.1
path: librispeech_asr-test.clean/speedup.1-*
- split: speedup.2
path: librispeech_asr-test.clean/speedup.2-*
- split: speedup.3
path: librispeech_asr-test.clean/speedup.3-*
- split: speedup.4
path: librispeech_asr-test.clean/speedup.4-*
- split: slowdown.1
path: librispeech_asr-test.clean/slowdown.1-*
- split: slowdown.2
path: librispeech_asr-test.clean/slowdown.2-*
- split: slowdown.3
path: librispeech_asr-test.clean/slowdown.3-*
- split: slowdown.4
path: librispeech_asr-test.clean/slowdown.4-*
- split: pitch_up.3
path: librispeech_asr-test.clean/pitch_up.3-*
- split: pitch_up.4
path: librispeech_asr-test.clean/pitch_up.4-*
- split: pitch_down.1
path: librispeech_asr-test.clean/pitch_down.1-*
- split: pitch_down.2
path: librispeech_asr-test.clean/pitch_down.2-*
- split: pitch_down.3
path: librispeech_asr-test.clean/pitch_down.3-*
- split: pitch_down.4
path: librispeech_asr-test.clean/pitch_down.4-*
- split: pitch_up.1
path: librispeech_asr-test.clean/pitch_up.1-*
- split: pitch_up.2
path: librispeech_asr-test.clean/pitch_up.2-*
- split: resample.1
path: librispeech_asr-test.clean/resample.1-*
- split: resample.2
path: librispeech_asr-test.clean/resample.2-*
- split: resample.3
path: librispeech_asr-test.clean/resample.3-*
- split: resample.4
path: librispeech_asr-test.clean/resample.4-*
- split: env_noise_esc50.1
path: librispeech_asr-test.clean/env_noise_esc50.1-*
- split: env_noise_esc50.2
path: librispeech_asr-test.clean/env_noise_esc50.2-*
- split: env_noise_esc50.3
path: librispeech_asr-test.clean/env_noise_esc50.3-*
- split: env_noise_esc50.4
path: librispeech_asr-test.clean/env_noise_esc50.4-*
- split: voice_conversion.4
path: librispeech_asr-test.clean/voice_conversion.4-*
- split: voice_conversion.3
path: librispeech_asr-test.clean/voice_conversion.3-*
- split: voice_conversion.1
path: librispeech_asr-test.clean/voice_conversion.1-*
- split: voice_conversion.2
path: librispeech_asr-test.clean/voice_conversion.2-*
- split: gain.1
path: librispeech_asr-test.clean/gain.1-*
- split: gain.2
path: librispeech_asr-test.clean/gain.2-*
- split: gain.3
path: librispeech_asr-test.clean/gain.3-*
- split: echo.1
path: librispeech_asr-test.clean/echo.1-*
- split: echo.2
path: librispeech_asr-test.clean/echo.2-*
- split: echo.3
path: librispeech_asr-test.clean/echo.3-*
- split: echo.4
path: librispeech_asr-test.clean/echo.4-*
- split: phaser.1
path: librispeech_asr-test.clean/phaser.1-*
- split: phaser.2
path: librispeech_asr-test.clean/phaser.2-*
- split: phaser.3
path: librispeech_asr-test.clean/phaser.3-*
- split: tempo_up.1
path: librispeech_asr-test.clean/tempo_up.1-*
- split: tempo_up.2
path: librispeech_asr-test.clean/tempo_up.2-*
- split: tempo_up.3
path: librispeech_asr-test.clean/tempo_up.3-*
- split: tempo_up.4
path: librispeech_asr-test.clean/tempo_up.4-*
- split: tempo_down.1
path: librispeech_asr-test.clean/tempo_down.1-*
- split: tempo_down.2
path: librispeech_asr-test.clean/tempo_down.2-*
- split: tempo_down.3
path: librispeech_asr-test.clean/tempo_down.3-*
- split: tempo_down.4
path: librispeech_asr-test.clean/tempo_down.4-*
- split: gain.4
path: librispeech_asr-test.clean/gain.4-*
- split: lowpass.1
path: librispeech_asr-test.clean/lowpass.1-*
- split: lowpass.2
path: librispeech_asr-test.clean/lowpass.2-*
- split: lowpass.3
path: librispeech_asr-test.clean/lowpass.3-*
- split: lowpass.4
path: librispeech_asr-test.clean/lowpass.4-*
- split: highpass.1
path: librispeech_asr-test.clean/highpass.1-*
- split: highpass.2
path: librispeech_asr-test.clean/highpass.2-*
- split: highpass.3
path: librispeech_asr-test.clean/highpass.3-*
- split: highpass.4
path: librispeech_asr-test.clean/highpass.4-*
- split: phaser.4
path: librispeech_asr-test.clean/phaser.4-*
- split: voice_conversion_vctk.1
path: librispeech_asr-test.clean/voice_conversion_vctk.1-*
- split: universal_adv.1
path: librispeech_asr-test.clean/universal_adv.1-*
- split: music.1
path: librispeech_asr-test.clean/music.1-*
- split: music.2
path: librispeech_asr-test.clean/music.2-*
- split: music.3
path: librispeech_asr-test.clean/music.3-*
- split: music.4
path: librispeech_asr-test.clean/music.4-*
- split: crosstalk.1
path: librispeech_asr-test.clean/crosstalk.1-*
- split: crosstalk.2
path: librispeech_asr-test.clean/crosstalk.2-*
- split: crosstalk.3
path: librispeech_asr-test.clean/crosstalk.3-*
- split: crosstalk.4
path: librispeech_asr-test.clean/crosstalk.4-*
- split: env_noise_musan.1
path: librispeech_asr-test.clean/env_noise_musan.1-*
- split: env_noise_musan.2
path: librispeech_asr-test.clean/env_noise_musan.2-*
- split: env_noise_musan.3
path: librispeech_asr-test.clean/env_noise_musan.3-*
- split: env_noise_musan.4
path: librispeech_asr-test.clean/env_noise_musan.4-*
- split: real_rir.1
path: librispeech_asr-test.clean/real_rir.1-*
- split: real_rir.2
path: librispeech_asr-test.clean/real_rir.2-*
- split: real_rir.3
path: librispeech_asr-test.clean/real_rir.3-*
- split: real_rir.4
path: librispeech_asr-test.clean/real_rir.4-*
- split: env_noise_wham.1
path: librispeech_asr-test.clean/env_noise_wham.1-*
- split: env_noise_wham.2
path: librispeech_asr-test.clean/env_noise_wham.2-*
- split: env_noise_wham.3
path: librispeech_asr-test.clean/env_noise_wham.3-*
- split: env_noise_wham.4
path: librispeech_asr-test.clean/env_noise_wham.4-*
- split: tremolo.1
path: librispeech_asr-test.clean/tremolo.1-*
- split: tremolo.2
path: librispeech_asr-test.clean/tremolo.2-*
- split: tremolo.3
path: librispeech_asr-test.clean/tremolo.3-*
- split: tremolo.4
path: librispeech_asr-test.clean/tremolo.4-*
- split: treble.1
path: librispeech_asr-test.clean/treble.1-*
- split: treble.2
path: librispeech_asr-test.clean/treble.2-*
- split: treble.3
path: librispeech_asr-test.clean/treble.3-*
- split: treble.4
path: librispeech_asr-test.clean/treble.4-*
- split: bass.1
path: librispeech_asr-test.clean/bass.1-*
- split: bass.2
path: librispeech_asr-test.clean/bass.2-*
- split: bass.3
path: librispeech_asr-test.clean/bass.3-*
- split: bass.4
path: librispeech_asr-test.clean/bass.4-*
- split: chorus.1
path: librispeech_asr-test.clean/chorus.1-*
- split: chorus.2
path: librispeech_asr-test.clean/chorus.2-*
- split: chorus.3
path: librispeech_asr-test.clean/chorus.3-*
- split: chorus.4
path: librispeech_asr-test.clean/chorus.4-*
- config_name: librispeech_asr-test.clean_pertEval_500_30
data_files:
- split: gnoise.1
path: librispeech_asr-test.clean_pertEval_500_30/gnoise.1-*
- split: env_noise_esc50.1
path: librispeech_asr-test.clean_pertEval_500_30/env_noise_esc50.1-*
- config_name: multilingual_librispeech-spanish_test
data_files:
- split: None.0
path: multilingual_librispeech-spanish_test/None.0-*
- split: gnoise.1
path: multilingual_librispeech-spanish_test/gnoise.1-*
- split: gnoise.2
path: multilingual_librispeech-spanish_test/gnoise.2-*
- split: gnoise.3
path: multilingual_librispeech-spanish_test/gnoise.3-*
- split: gnoise.4
path: multilingual_librispeech-spanish_test/gnoise.4-*
- split: env_noise.1
path: multilingual_librispeech-spanish_test/env_noise.1-*
- split: env_noise.2
path: multilingual_librispeech-spanish_test/env_noise.2-*
- split: env_noise.3
path: multilingual_librispeech-spanish_test/env_noise.3-*
- split: env_noise.4
path: multilingual_librispeech-spanish_test/env_noise.4-*
- split: rir.1
path: multilingual_librispeech-spanish_test/rir.1-*
- split: rir.2
path: multilingual_librispeech-spanish_test/rir.2-*
- split: rir.3
path: multilingual_librispeech-spanish_test/rir.3-*
- split: rir.4
path: multilingual_librispeech-spanish_test/rir.4-*
- split: speedup.1
path: multilingual_librispeech-spanish_test/speedup.1-*
- split: speedup.2
path: multilingual_librispeech-spanish_test/speedup.2-*
- split: speedup.3
path: multilingual_librispeech-spanish_test/speedup.3-*
- split: speedup.4
path: multilingual_librispeech-spanish_test/speedup.4-*
- split: slowdown.1
path: multilingual_librispeech-spanish_test/slowdown.1-*
- split: slowdown.2
path: multilingual_librispeech-spanish_test/slowdown.2-*
- split: slowdown.3
path: multilingual_librispeech-spanish_test/slowdown.3-*
- split: slowdown.4
path: multilingual_librispeech-spanish_test/slowdown.4-*
- split: pitch_up.3
path: multilingual_librispeech-spanish_test/pitch_up.3-*
- split: pitch_up.4
path: multilingual_librispeech-spanish_test/pitch_up.4-*
- split: pitch_down.1
path: multilingual_librispeech-spanish_test/pitch_down.1-*
- split: pitch_down.2
path: multilingual_librispeech-spanish_test/pitch_down.2-*
- split: pitch_down.3
path: multilingual_librispeech-spanish_test/pitch_down.3-*
- split: pitch_down.4
path: multilingual_librispeech-spanish_test/pitch_down.4-*
- split: pitch_up.1
path: multilingual_librispeech-spanish_test/pitch_up.1-*
- split: pitch_up.2
path: multilingual_librispeech-spanish_test/pitch_up.2-*
- split: resample.2
path: multilingual_librispeech-spanish_test/resample.2-*
- split: resample.3
path: multilingual_librispeech-spanish_test/resample.3-*
- split: resample.4
path: multilingual_librispeech-spanish_test/resample.4-*
- split: env_noise_esc50.1
path: multilingual_librispeech-spanish_test/env_noise_esc50.1-*
- split: env_noise_esc50.2
path: multilingual_librispeech-spanish_test/env_noise_esc50.2-*
- split: env_noise_esc50.3
path: multilingual_librispeech-spanish_test/env_noise_esc50.3-*
- split: env_noise_esc50.4
path: multilingual_librispeech-spanish_test/env_noise_esc50.4-*
- split: resample.1
path: multilingual_librispeech-spanish_test/resample.1-*
- split: gain.1
path: multilingual_librispeech-spanish_test/gain.1-*
- split: gain.2
path: multilingual_librispeech-spanish_test/gain.2-*
- split: gain.3
path: multilingual_librispeech-spanish_test/gain.3-*
- split: gain.4
path: multilingual_librispeech-spanish_test/gain.4-*
- split: echo.4
path: multilingual_librispeech-spanish_test/echo.4-*
- split: echo.1
path: multilingual_librispeech-spanish_test/echo.1-*
- split: echo.2
path: multilingual_librispeech-spanish_test/echo.2-*
- split: echo.3
path: multilingual_librispeech-spanish_test/echo.3-*
- split: tempo_up.1
path: multilingual_librispeech-spanish_test/tempo_up.1-*
- split: tempo_up.2
path: multilingual_librispeech-spanish_test/tempo_up.2-*
- split: tempo_up.3
path: multilingual_librispeech-spanish_test/tempo_up.3-*
- split: tempo_up.4
path: multilingual_librispeech-spanish_test/tempo_up.4-*
- split: tempo_down.1
path: multilingual_librispeech-spanish_test/tempo_down.1-*
- split: tempo_down.2
path: multilingual_librispeech-spanish_test/tempo_down.2-*
- split: tempo_down.3
path: multilingual_librispeech-spanish_test/tempo_down.3-*
- split: tempo_down.4
path: multilingual_librispeech-spanish_test/tempo_down.4-*
- split: lowpass.1
path: multilingual_librispeech-spanish_test/lowpass.1-*
- split: lowpass.2
path: multilingual_librispeech-spanish_test/lowpass.2-*
- split: lowpass.3
path: multilingual_librispeech-spanish_test/lowpass.3-*
- split: lowpass.4
path: multilingual_librispeech-spanish_test/lowpass.4-*
- split: highpass.1
path: multilingual_librispeech-spanish_test/highpass.1-*
- split: highpass.2
path: multilingual_librispeech-spanish_test/highpass.2-*
- split: highpass.3
path: multilingual_librispeech-spanish_test/highpass.3-*
- split: highpass.4
path: multilingual_librispeech-spanish_test/highpass.4-*
- split: phaser.1
path: multilingual_librispeech-spanish_test/phaser.1-*
- split: phaser.2
path: multilingual_librispeech-spanish_test/phaser.2-*
- split: phaser.3
path: multilingual_librispeech-spanish_test/phaser.3-*
- split: phaser.4
path: multilingual_librispeech-spanish_test/phaser.4-*
- split: env_noise_musan.1
path: multilingual_librispeech-spanish_test/env_noise_musan.1-*
- split: env_noise_musan.2
path: multilingual_librispeech-spanish_test/env_noise_musan.2-*
- split: env_noise_musan.3
path: multilingual_librispeech-spanish_test/env_noise_musan.3-*
- split: env_noise_musan.4
path: multilingual_librispeech-spanish_test/env_noise_musan.4-*
- split: music.1
path: multilingual_librispeech-spanish_test/music.1-*
- split: music.2
path: multilingual_librispeech-spanish_test/music.2-*
- split: music.3
path: multilingual_librispeech-spanish_test/music.3-*
- split: music.4
path: multilingual_librispeech-spanish_test/music.4-*
- split: crosstalk.1
path: multilingual_librispeech-spanish_test/crosstalk.1-*
- split: crosstalk.2
path: multilingual_librispeech-spanish_test/crosstalk.2-*
- split: crosstalk.3
path: multilingual_librispeech-spanish_test/crosstalk.3-*
- split: crosstalk.4
path: multilingual_librispeech-spanish_test/crosstalk.4-*
- split: env_noise_wham.1
path: multilingual_librispeech-spanish_test/env_noise_wham.1-*
- split: env_noise_wham.2
path: multilingual_librispeech-spanish_test/env_noise_wham.2-*
- split: env_noise_wham.3
path: multilingual_librispeech-spanish_test/env_noise_wham.3-*
- split: env_noise_wham.4
path: multilingual_librispeech-spanish_test/env_noise_wham.4-*
- split: tremolo.1
path: multilingual_librispeech-spanish_test/tremolo.1-*
- split: tremolo.2
path: multilingual_librispeech-spanish_test/tremolo.2-*
- split: tremolo.4
path: multilingual_librispeech-spanish_test/tremolo.4-*
- split: treble.1
path: multilingual_librispeech-spanish_test/treble.1-*
- split: treble.2
path: multilingual_librispeech-spanish_test/treble.2-*
- split: treble.3
path: multilingual_librispeech-spanish_test/treble.3-*
- split: treble.4
path: multilingual_librispeech-spanish_test/treble.4-*
- split: bass.1
path: multilingual_librispeech-spanish_test/bass.1-*
- split: bass.2
path: multilingual_librispeech-spanish_test/bass.2-*
- split: bass.3
path: multilingual_librispeech-spanish_test/bass.3-*
- split: bass.4
path: multilingual_librispeech-spanish_test/bass.4-*
- split: chorus.1
path: multilingual_librispeech-spanish_test/chorus.1-*
- split: chorus.2
path: multilingual_librispeech-spanish_test/chorus.2-*
- split: chorus.3
path: multilingual_librispeech-spanish_test/chorus.3-*
- split: chorus.4
path: multilingual_librispeech-spanish_test/chorus.4-*
- split: tremolo.3
path: multilingual_librispeech-spanish_test/tremolo.3-*
- config_name: multilingual_librispeech-spanish_test_pertEval_500_30
data_files:
- split: gnoise.1
path: multilingual_librispeech-spanish_test_pertEval_500_30/gnoise.1-*
- split: env_noise_esc50.1
path: multilingual_librispeech-spanish_test_pertEval_500_30/env_noise_esc50.1-*
- config_name: tedlium-release3_test
data_files:
- split: gnoise.1
path: tedlium-release3_test/gnoise.1-*
- split: gnoise.2
path: tedlium-release3_test/gnoise.2-*
- split: gnoise.3
path: tedlium-release3_test/gnoise.3-*
- split: env_noise_esc50.1
path: tedlium-release3_test/env_noise_esc50.1-*
- split: env_noise_esc50.2
path: tedlium-release3_test/env_noise_esc50.2-*
- split: env_noise_esc50.3
path: tedlium-release3_test/env_noise_esc50.3-*
- split: speedup.1
path: tedlium-release3_test/speedup.1-*
- split: speedup.2
path: tedlium-release3_test/speedup.2-*
- split: speedup.3
path: tedlium-release3_test/speedup.3-*
- split: slowdown.1
path: tedlium-release3_test/slowdown.1-*
- split: slowdown.2
path: tedlium-release3_test/slowdown.2-*
- split: slowdown.3
path: tedlium-release3_test/slowdown.3-*
- split: pitch_up.1
path: tedlium-release3_test/pitch_up.1-*
- split: pitch_up.2
path: tedlium-release3_test/pitch_up.2-*
- split: pitch_up.3
path: tedlium-release3_test/pitch_up.3-*
- split: pitch_down.1
path: tedlium-release3_test/pitch_down.1-*
- split: pitch_down.2
path: tedlium-release3_test/pitch_down.2-*
- split: pitch_down.3
path: tedlium-release3_test/pitch_down.3-*
- split: rir.1
path: tedlium-release3_test/rir.1-*
- split: rir.2
path: tedlium-release3_test/rir.2-*
- split: rir.3
path: tedlium-release3_test/rir.3-*
- split: voice_conversion_vctk.1
path: tedlium-release3_test/voice_conversion_vctk.1-*
- split: resample.1
path: tedlium-release3_test/resample.1-*
- split: resample.2
path: tedlium-release3_test/resample.2-*
- split: resample.3
path: tedlium-release3_test/resample.3-*
- split: gain.1
path: tedlium-release3_test/gain.1-*
- split: gain.2
path: tedlium-release3_test/gain.2-*
- split: gain.3
path: tedlium-release3_test/gain.3-*
- split: echo.1
path: tedlium-release3_test/echo.1-*
- split: echo.2
path: tedlium-release3_test/echo.2-*
- split: echo.3
path: tedlium-release3_test/echo.3-*
- split: phaser.1
path: tedlium-release3_test/phaser.1-*
- split: phaser.2
path: tedlium-release3_test/phaser.2-*
- split: phaser.3
path: tedlium-release3_test/phaser.3-*
- split: tempo_up.1
path: tedlium-release3_test/tempo_up.1-*
- split: tempo_up.2
path: tedlium-release3_test/tempo_up.2-*
- split: tempo_up.3
path: tedlium-release3_test/tempo_up.3-*
- split: tempo_down.1
path: tedlium-release3_test/tempo_down.1-*
- split: tempo_down.2
path: tedlium-release3_test/tempo_down.2-*
- split: tempo_down.3
path: tedlium-release3_test/tempo_down.3-*
- split: lowpass.1
path: tedlium-release3_test/lowpass.1-*
- split: lowpass.2
path: tedlium-release3_test/lowpass.2-*
- split: lowpass.3
path: tedlium-release3_test/lowpass.3-*
- split: highpass.1
path: tedlium-release3_test/highpass.1-*
- split: highpass.2
path: tedlium-release3_test/highpass.2-*
- split: highpass.3
path: tedlium-release3_test/highpass.3-*
- split: gnoise.4
path: tedlium-release3_test/gnoise.4-*
- split: env_noise_esc50.4
path: tedlium-release3_test/env_noise_esc50.4-*
- split: speedup.4
path: tedlium-release3_test/speedup.4-*
- split: slowdown.4
path: tedlium-release3_test/slowdown.4-*
- split: pitch_up.4
path: tedlium-release3_test/pitch_up.4-*
- split: pitch_down.4
path: tedlium-release3_test/pitch_down.4-*
- split: rir.4
path: tedlium-release3_test/rir.4-*
- split: resample.4
path: tedlium-release3_test/resample.4-*
- split: gain.4
path: tedlium-release3_test/gain.4-*
- split: echo.4
path: tedlium-release3_test/echo.4-*
- split: phaser.4
path: tedlium-release3_test/phaser.4-*
- split: tempo_up.4
path: tedlium-release3_test/tempo_up.4-*
- split: tempo_down.4
path: tedlium-release3_test/tempo_down.4-*
- split: lowpass.4
path: tedlium-release3_test/lowpass.4-*
- split: highpass.4
path: tedlium-release3_test/highpass.4-*
- split: None.0
path: tedlium-release3_test/None.0-*
- split: music.1
path: tedlium-release3_test/music.1-*
- split: music.2
path: tedlium-release3_test/music.2-*
- split: music.3
path: tedlium-release3_test/music.3-*
- split: music.4
path: tedlium-release3_test/music.4-*
- split: crosstalk.1
path: tedlium-release3_test/crosstalk.1-*
- split: crosstalk.2
path: tedlium-release3_test/crosstalk.2-*
- split: crosstalk.3
path: tedlium-release3_test/crosstalk.3-*
- split: crosstalk.4
path: tedlium-release3_test/crosstalk.4-*
- split: env_noise_musan.1
path: tedlium-release3_test/env_noise_musan.1-*
- split: env_noise_musan.2
path: tedlium-release3_test/env_noise_musan.2-*
- split: env_noise_musan.3
path: tedlium-release3_test/env_noise_musan.3-*
- split: env_noise_musan.4
path: tedlium-release3_test/env_noise_musan.4-*
- split: real_rir.1
path: tedlium-release3_test/real_rir.1-*
- split: real_rir.2
path: tedlium-release3_test/real_rir.2-*
- split: real_rir.3
path: tedlium-release3_test/real_rir.3-*
- split: real_rir.4
path: tedlium-release3_test/real_rir.4-*
- split: env_noise.1
path: tedlium-release3_test/env_noise.1-*
- split: env_noise.2
path: tedlium-release3_test/env_noise.2-*
- split: env_noise.3
path: tedlium-release3_test/env_noise.3-*
- split: env_noise.4
path: tedlium-release3_test/env_noise.4-*
- split: env_noise_wham.1
path: tedlium-release3_test/env_noise_wham.1-*
- split: env_noise_wham.2
path: tedlium-release3_test/env_noise_wham.2-*
- split: env_noise_wham.3
path: tedlium-release3_test/env_noise_wham.3-*
- split: env_noise_wham.4
path: tedlium-release3_test/env_noise_wham.4-*
- split: tremolo.1
path: tedlium-release3_test/tremolo.1-*
- split: tremolo.2
path: tedlium-release3_test/tremolo.2-*
- split: tremolo.3
path: tedlium-release3_test/tremolo.3-*
- split: tremolo.4
path: tedlium-release3_test/tremolo.4-*
- split: treble.1
path: tedlium-release3_test/treble.1-*
- split: treble.2
path: tedlium-release3_test/treble.2-*
- split: treble.3
path: tedlium-release3_test/treble.3-*
- split: treble.4
path: tedlium-release3_test/treble.4-*
- split: bass.1
path: tedlium-release3_test/bass.1-*
- split: bass.2
path: tedlium-release3_test/bass.2-*
- split: bass.3
path: tedlium-release3_test/bass.3-*
- split: bass.4
path: tedlium-release3_test/bass.4-*
- split: chorus.1
path: tedlium-release3_test/chorus.1-*
- split: chorus.2
path: tedlium-release3_test/chorus.2-*
- split: chorus.4
path: tedlium-release3_test/chorus.4-*
- split: chorus.3
path: tedlium-release3_test/chorus.3-*
---
# Dataset Card for "speech_robust_bench"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
mshah1
原始信息汇总
数据集概述
数据集配置 accented_cv
- 特征:
audio: 采样率 16000 Hztext: 字符串age: 字符串gender: 字符串accents: 字符串locale: 字符串id: 整数 (int64)
- 分割:
test: 55407854.085 字节, 1355 个样本test.clean: 25593824.0 字节, 640 个样本
- 下载大小: 78598662 字节
- 数据集大小: 81001678.08500001 字节
数据集配置 chime
- 特征:
audio: 音频end_time: 字符串start_time: 字符串speaker: 字符串ref: 字符串location: 字符串session_id: 字符串text: 字符串
- 分割:
farfield: 521160936.31 字节, 6535 个样本nearfield: 1072274621.0799999 字节, 6535 个样本
- 下载大小: 1532887016 字节
- 数据集大小: 1593435557.3899999 字节
数据集配置 in-the-wild
- 特征:
audio: 音频end_time: 字符串start_time: 字符串speaker: 字符串ref: 字符串location: 字符串session_id: 字符串id: 字符串text: 字符串
- 分割:
farfield: 521363521.31 字节, 6535 个样本nearfield: 1072477206.0799999 字节, 6535 个样本
- 下载大小: 1533124839 字节
- 数据集大小: 1593840727.3899999 字节
数据集配置 in-the-wild-AMI
- 特征:
meeting_id: 字符串id: 字符串text: 字符串audio: 采样率 16000 Hzbegin_time: 浮点数 (float32)end_time: 浮点数 (float32)microphone_id: 字符串speaker_id: 字符串
- 分割:
nearfield: 1382749390.9785259 字节, 6584 个样本farfield: 1040706691.1008185 字节, 6584 个样本
- 下载大小: 2164898498 字节
- 数据集大小: 2423456082.0793443 字节
数据集配置 in-the-wild-ami
- 特征:
meeting_id: 字符串audio_id: 字符串text: 字符串audio: 采样率 16000 Hzbegin_time: 浮点数 (float32)end_time: 浮点数 (float32)microphone_id: 字符串speaker_id: 字符串
- 分割:
nearfield: 1382749390.9785259 字节, 6584 个样本farfield: 1040706691.1008185 字节, 6584 个样本
- 下载大小: 2164900274 字节
- 数据集大小: 2423456082.0793443 字节
数据集配置 librispeech_asr-test.clean
- 特征:
file: 字符串audio: 采样率 16000 Hztext: 字符串speaker_id: 整数 (int64)chapter_id: 整数 (int64)id: 字符串
- 分割:
speedup.1: 498896619.34 字节, 2620 个样本speedup.2: 415901075.34 字节, 2620 个样本speedup.3: 356617835.34 字节, 2620 个样本speedup.4: 312152811.34 字节, 2620 个样本slowdown.1: 712320343.34 字节, 2620 个样本slowdown.2: 830887339.34 字节, 2620 个样本slowdown.3: 996880127.34 字节, 2620 个样本slowdown.4: 1245871847.34 字节, 2620 个样本pitch_up.3: 623392467.34 字节, 2620 个样本pitch_up.4: 623392467.34 字节, 2620 个样本pitch_down.1: 623392467.34 字节, 2620 个样本pitch_down.2: 623392467.34 字节, 2620 个样本pitch_down.3: 623392467.34 字节, 2620 个样本pitch_down.4: 623392467.34 字节, 2620 个样本pitch_up.1: 623392458.5 字节, 2620 个样本pitch_up.2: 623392458.5 字节, 2620 个样本resample.1: 623392535.34 字节, 2620 个样本resample.2: 623392535.34 字节, 2620 个样本resample.3: 623392579.34 字节, 2620 个样本resample.4: 623392623.34 字节, 2620 个样本voice_conversion.4: 799852214.5 字节, 2620 个样本voice_conversion.3: 580185782.5 字节, 2620 个样本voice_conversion.1: 589259446.5 字节, 2620 个样本voice_conversion.2: 571175606.5 字节, 2620 个样本gain.1: 623392467.34 字节, 2620 个样本gain.2: 623392467.34 字节, 2620 个样本gain.3: 623392467.34 字节, 2620 个样本echo.1: 633872467.34 字节, 2620 个样本echo.2: 644352467.34 字节, 2620 个样本echo.3: 665312467.34 字节, 2620 个样本echo.4: 707232467.34 字节, 2620 个样本phaser.1: 623392467.34 字节, 2620 个样本phaser.2: 623392467.34 字节, 2620 个样本phaser.3: 623392467.34 字节, 2620 个样本tempo_up.1: 498896595.34 字节, 2620 个样本tempo_up.2: 415899351.34 字节, 2620 个样本tempo_up.3: 356615595.34 字节, 2620 个样本tempo_up.4: 312152811.34 字节, 2620 个样本tempo_down.1: 712318083.34 字节, 2620 个样本tempo_down.2: 830885583.34 字节, 2620 个样本tempo_down.3: 996880103.34 字节, 2620 个样本tempo_down.4: 1245871847.34 字节, 2620 个样本gain.4: 623392467.34 字节, 2620 个样本phaser.4: 623392467.34 字节, 2620 个样本lowpass.1: 623392467.34 字节, 2620 个样本lowpass.2: 623392467.34 字节, 2620 个样本lowpass.3: 623392467.34 字节, 2620 个样本lowpass.4: 623392467.34 字节, 2620 个样本highpass.1: 623392467.34 字节, 2620 个样本highpass.2: 623392467.34 字节, 2620 个样本highpass.3: 623392467.34 字节, 2620 个样本highpass.4: 623392467.34 字节, 2620 个样本voice_conversion_vctk.1: 495165825.88 字节, 2620 个样本universal_adv.1: 623392467.34 字节, 2620 个样本rir.1: 705636818.5 字节, 2620 个样本rir.2: 744484818.5 字节, 2620 个样本rir.3: 758740818.5 字节, 2620 个样本rir.4: 776116818.5 字节, 2620 个样本gnoise.1: 623392455.88 字节, 2620 个样本gnoise.2: 623392455.88 字节, 2620 个样本gnoise.3: 623392455.88 字节, 2620 个样本gnoise.4: 623392455.88 字节, 2620 个样本env_noise_esc50.1: 623392455.88 字节, 2620 个样本env_noise_esc50.2: 623392455.88 字节, 2620 个样本env_noise_esc50.3: 623392455.88 字节, 2620 个样本env_noise_esc50.4: 623392455.88 字节, 2620 个样本music.1: 623392455.88 字节, 2620 个样本music.2: 623392455.88 字节, 2620 个样本music.3: 623392455.88 字节, 2620 个样本music.4: 623392455.88 字节, 2620 个样本crosstalk.1: 623392455.88 字节, 2620 个样本crosstalk.2: 623392455.88 字节, 2620 个样本crosstalk.3: 623392455.88 字节, 2620 个样本crosstalk.4: 623392455.88 字节, 2620 个样本env_noise_musan.1: 623392455.88 字节, 2620 个样本env_noise_musan.2: 623392455.88 字节, 2620 个样本env_noise_musan.3: 623392455.88 字节, 2620 个样本env_noise_musan.4: 623392455.88 字节, 2620 个样本real_rir.1: 638169615.88 字节, 2620 个样本real_rir.2: 694281819.88 字节, 2620 个样本real_rir.3: 713
搜集汇总
数据集介绍

构建方式
在语音识别领域,评估模型在多样化声学环境下的鲁棒性至关重要。Speech Robust Bench数据集通过整合多个知名语音语料库,如LibriSpeech、Common Voice以及CHiME等,构建了一个综合性的测试平台。其构建过程涉及对原始音频数据施加系统性的声学扰动,包括语速变化、音高调整、环境噪声叠加、混响模拟等多种失真类型,每种扰动均以不同强度等级进行配置,从而生成具有可控退化程度的测试样本。这种分层扰动策略确保了数据集能够全面覆盖现实世界中可能遇到的声学变异场景。
特点
该数据集的核心特征在于其精心设计的扰动维度和丰富的元数据标注。它不仅提供了涵盖语音速度、音高、背景噪声、房间脉冲响应等数十种声学变异的子集,还包含了说话人年龄、性别、口音以及录音环境等详尽的上下文信息。每个音频样本均以16kHz采样率存储,并配有精确的文本转录,使得数据集能够支持对语音识别系统在多种退化条件下的性能进行细粒度分析。这种多维度的特征设计为深入研究模型鲁棒性的失效模式提供了坚实基础。
使用方法
研究人员可利用该数据集对自动语音识别模型进行系统性基准测试。通过加载特定的配置名称,如`accented_cv`或`librispeech_asr-test.clean`,可以访问对应的原始或扰动后的测试分割。典型的使用流程包括使用Hugging Face `datasets`库加载数据,随后将音频特征与文本标签输入模型进行推理,并计算词错误率等评价指标。数据集的结构化设计允许用户轻松比较模型在不同扰动类型和强度下的性能表现,从而识别模型的薄弱环节并指导后续的鲁棒性增强研究。
背景与挑战
背景概述
在自动语音识别技术日益成熟的背景下,模型在实验室环境下虽表现卓越,但在真实世界的复杂声学环境中其鲁棒性仍面临严峻考验。Speech Robustness Benchmark数据集由研究人员mshah1构建,旨在系统评估语音识别系统在多样化声学扰动下的性能表现。该数据集整合了多个经典语音语料库,如LibriSpeech、CHiME及多语言变体,并引入了广泛的声学扰动类型,包括背景噪声、混响、口音变异及信号处理失真等,为语音识别模型的鲁棒性评估提供了标准化测试平台。其核心研究问题聚焦于如何量化模型在非理想条件下的泛化能力,推动了语音技术从实验室向实际应用场景的过渡,对提升语音系统的实用性与可靠性具有深远影响。
当前挑战
该数据集致力于解决自动语音识别在复杂声学环境中的鲁棒性挑战,具体包括模型对背景噪声、口音变异、混响及多种音频失真的敏感性问题。在构建过程中,挑战主要体现在声学扰动的系统化生成与标注上,例如需要精确模拟真实环境中的噪声叠加、多语言口音的数据采集与平衡,以及确保各种扰动参数(如信噪比、混响时间)的合理覆盖与一致性。此外,整合不同来源的原始语料库并保持数据格式与标注的统一性,也是一项复杂的工程任务。这些挑战共同指向了构建一个全面、可靠且可复现的语音鲁棒性评估基准的难度。
常用场景
经典使用场景
在语音识别领域,robustness benchmark数据集常被用于评估模型在多样化声学环境下的性能表现。该数据集通过整合多种口音、噪声干扰及声学变换,模拟了真实世界中的复杂语音场景,为研究者提供了一个标准化的测试平台,以系统性地检验自动语音识别系统的鲁棒性。
衍生相关工作
围绕该数据集,已衍生出多项经典研究工作,包括基于对抗训练的鲁棒性增强方法、多任务学习框架以及域泛化策略。这些工作不仅深化了对语音识别鲁棒性机制的理解,还催生了如噪声不变特征提取、自适应声学建模等创新技术,持续推动着语音处理领域的进步。
数据集最近研究
最新研究方向
在语音识别领域,随着模型在标准数据集上性能趋于饱和,研究焦点转向评估系统在真实复杂环境下的鲁棒性。Speech Robust Bench数据集通过整合多种口音、噪声、混响及声学扰动,为前沿研究提供了系统性基准。当前研究热点集中于利用该数据集开发对抗性训练策略与多模态融合方法,以提升模型在嘈杂场景和口音变异中的泛化能力。这些探索不仅推动了语音技术在实际应用中的可靠性,也为跨语言语音处理系统的公平性评估奠定了重要基础。
以上内容由遇见数据集搜集并总结生成



