mshah1/speech_robust_bench

Name: mshah1/speech_robust_bench
Creator: mshah1
Published: 2024-06-03 04:27:21
License: 暂无描述

Hugging Face2024-06-03 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/mshah1/speech_robust_bench

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: - config_name: accented_cv features: - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: age dtype: string - name: gender dtype: string - name: accents dtype: string - name: locale dtype: string - name: id dtype: int64 splits: - name: test num_bytes: 55407854.085 num_examples: 1355 - name: test.clean num_bytes: 25593824.0 num_examples: 640 download_size: 78598662 dataset_size: 81001678.08500001 - config_name: chime features: - name: audio dtype: audio - name: end_time dtype: string - name: start_time dtype: string - name: speaker dtype: string - name: ref dtype: string - name: location dtype: string - name: session_id dtype: string - name: text dtype: string splits: - name: farfield num_bytes: 521160936.31 num_examples: 6535 - name: nearfield num_bytes: 1072274621.0799999 num_examples: 6535 download_size: 1532887016 dataset_size: 1593435557.3899999 - config_name: in-the-wild features: - name: audio dtype: audio - name: end_time dtype: string - name: start_time dtype: string - name: speaker dtype: string - name: ref dtype: string - name: location dtype: string - name: session_id dtype: string - name: id dtype: string - name: text dtype: string splits: - name: farfield num_bytes: 521363521.31 num_examples: 6535 - name: nearfield num_bytes: 1072477206.0799999 num_examples: 6535 download_size: 1533124839 dataset_size: 1593840727.3899999 - config_name: in-the-wild-AMI features: - name: meeting_id dtype: string - name: id dtype: string - name: text dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: begin_time dtype: float32 - name: end_time dtype: float32 - name: microphone_id dtype: string - name: speaker_id dtype: string splits: - name: nearfield num_bytes: 1382749390.9785259 num_examples: 6584 - name: farfield num_bytes: 1040706691.1008185 num_examples: 6584 download_size: 2164898498 dataset_size: 2423456082.0793443 - config_name: in-the-wild-ami features: - name: meeting_id dtype: string - name: audio_id dtype: string - name: text dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: begin_time dtype: float32 - name: end_time dtype: float32 - name: microphone_id dtype: string - name: speaker_id dtype: string splits: - name: nearfield num_bytes: 1382749390.9785259 num_examples: 6584 - name: farfield num_bytes: 1040706691.1008185 num_examples: 6584 download_size: 2164900274 dataset_size: 2423456082.0793443 - config_name: librispeech_asr-test.clean features: - name: file dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: speaker_id dtype: int64 - name: chapter_id dtype: int64 - name: id dtype: string splits: - name: speedup.1 num_bytes: 498896619.34 num_examples: 2620 - name: speedup.2 num_bytes: 415901075.34 num_examples: 2620 - name: speedup.3 num_bytes: 356617835.34 num_examples: 2620 - name: speedup.4 num_bytes: 312152811.34 num_examples: 2620 - name: slowdown.1 num_bytes: 712320343.34 num_examples: 2620 - name: slowdown.2 num_bytes: 830887339.34 num_examples: 2620 - name: slowdown.3 num_bytes: 996880127.34 num_examples: 2620 - name: slowdown.4 num_bytes: 1245871847.34 num_examples: 2620 - name: pitch_up.3 num_bytes: 623392467.34 num_examples: 2620 - name: pitch_up.4 num_bytes: 623392467.34 num_examples: 2620 - name: pitch_down.1 num_bytes: 623392467.34 num_examples: 2620 - name: pitch_down.2 num_bytes: 623392467.34 num_examples: 2620 - name: pitch_down.3 num_bytes: 623392467.34 num_examples: 2620 - name: pitch_down.4 num_bytes: 623392467.34 num_examples: 2620 - name: pitch_up.1 num_bytes: 623392458.5 num_examples: 2620 - name: pitch_up.2 num_bytes: 623392458.5 num_examples: 2620 - name: resample.1 num_bytes: 623392535.34 num_examples: 2620 - name: resample.2 num_bytes: 623392535.34 num_examples: 2620 - name: resample.3 num_bytes: 623392579.34 num_examples: 2620 - name: resample.4 num_bytes: 623392623.34 num_examples: 2620 - name: voice_conversion.4 num_bytes: 799852214.5 num_examples: 2620 - name: voice_conversion.3 num_bytes: 580185782.5 num_examples: 2620 - name: voice_conversion.1 num_bytes: 589259446.5 num_examples: 2620 - name: voice_conversion.2 num_bytes: 571175606.5 num_examples: 2620 - name: gain.1 num_bytes: 623392467.34 num_examples: 2620 - name: gain.2 num_bytes: 623392467.34 num_examples: 2620 - name: gain.3 num_bytes: 623392467.34 num_examples: 2620 - name: echo.1 num_bytes: 633872467.34 num_examples: 2620 - name: echo.2 num_bytes: 644352467.34 num_examples: 2620 - name: echo.3 num_bytes: 665312467.34 num_examples: 2620 - name: echo.4 num_bytes: 707232467.34 num_examples: 2620 - name: phaser.1 num_bytes: 623392467.34 num_examples: 2620 - name: phaser.2 num_bytes: 623392467.34 num_examples: 2620 - name: phaser.3 num_bytes: 623392467.34 num_examples: 2620 - name: tempo_up.1 num_bytes: 498896595.34 num_examples: 2620 - name: tempo_up.2 num_bytes: 415899351.34 num_examples: 2620 - name: tempo_up.3 num_bytes: 356615595.34 num_examples: 2620 - name: tempo_up.4 num_bytes: 312152811.34 num_examples: 2620 - name: tempo_down.1 num_bytes: 712318083.34 num_examples: 2620 - name: tempo_down.2 num_bytes: 830885583.34 num_examples: 2620 - name: tempo_down.3 num_bytes: 996880103.34 num_examples: 2620 - name: tempo_down.4 num_bytes: 1245871847.34 num_examples: 2620 - name: gain.4 num_bytes: 623392467.34 num_examples: 2620 - name: phaser.4 num_bytes: 623392467.34 num_examples: 2620 - name: lowpass.1 num_bytes: 623392467.34 num_examples: 2620 - name: lowpass.2 num_bytes: 623392467.34 num_examples: 2620 - name: lowpass.3 num_bytes: 623392467.34 num_examples: 2620 - name: lowpass.4 num_bytes: 623392467.34 num_examples: 2620 - name: highpass.1 num_bytes: 623392467.34 num_examples: 2620 - name: highpass.2 num_bytes: 623392467.34 num_examples: 2620 - name: highpass.3 num_bytes: 623392467.34 num_examples: 2620 - name: highpass.4 num_bytes: 623392467.34 num_examples: 2620 - name: voice_conversion_vctk.1 num_bytes: 495165825.88 num_examples: 2620 - name: universal_adv.1 num_bytes: 623392467.34 num_examples: 2620 - name: rir.1 num_bytes: 705636818.5 num_examples: 2620 - name: rir.2 num_bytes: 744484818.5 num_examples: 2620 - name: rir.3 num_bytes: 758740818.5 num_examples: 2620 - name: rir.4 num_bytes: 776116818.5 num_examples: 2620 - name: gnoise.1 num_bytes: 623392455.88 num_examples: 2620 - name: gnoise.2 num_bytes: 623392455.88 num_examples: 2620 - name: gnoise.3 num_bytes: 623392455.88 num_examples: 2620 - name: gnoise.4 num_bytes: 623392455.88 num_examples: 2620 - name: env_noise_esc50.1 num_bytes: 623392455.88 num_examples: 2620 - name: env_noise_esc50.2 num_bytes: 623392455.88 num_examples: 2620 - name: env_noise_esc50.3 num_bytes: 623392455.88 num_examples: 2620 - name: env_noise_esc50.4 num_bytes: 623392455.88 num_examples: 2620 - name: music.1 num_bytes: 623392455.88 num_examples: 2620 - name: music.2 num_bytes: 623392455.88 num_examples: 2620 - name: music.3 num_bytes: 623392455.88 num_examples: 2620 - name: music.4 num_bytes: 623392455.88 num_examples: 2620 - name: crosstalk.1 num_bytes: 623392455.88 num_examples: 2620 - name: crosstalk.2 num_bytes: 623392455.88 num_examples: 2620 - name: crosstalk.3 num_bytes: 623392455.88 num_examples: 2620 - name: crosstalk.4 num_bytes: 623392455.88 num_examples: 2620 - name: env_noise_musan.1 num_bytes: 623392455.88 num_examples: 2620 - name: env_noise_musan.2 num_bytes: 623392455.88 num_examples: 2620 - name: env_noise_musan.3 num_bytes: 623392455.88 num_examples: 2620 - name: env_noise_musan.4 num_bytes: 623392455.88 num_examples: 2620 - name: real_rir.1 num_bytes: 638169615.88 num_examples: 2620 - name: real_rir.2 num_bytes: 694281819.88 num_examples: 2620 - name: real_rir.3 num_bytes: 713200537.88 num_examples: 2620 - name: real_rir.4 num_bytes: 1515177725.88 num_examples: 2620 - name: env_noise.1 num_bytes: 623392455.88 num_examples: 2620 - name: env_noise.2 num_bytes: 623392455.88 num_examples: 2620 - name: env_noise.3 num_bytes: 623392455.88 num_examples: 2620 - name: env_noise.4 num_bytes: 623392455.88 num_examples: 2620 - name: env_noise_wham.1 num_bytes: 623392455.88 num_examples: 2620 - name: env_noise_wham.2 num_bytes: 623392455.88 num_examples: 2620 - name: env_noise_wham.3 num_bytes: 623392455.88 num_examples: 2620 - name: env_noise_wham.4 num_bytes: 623392455.88 num_examples: 2620 - name: tremolo.1 num_bytes: 623392455.88 num_examples: 2620 - name: tremolo.2 num_bytes: 623392455.88 num_examples: 2620 - name: tremolo.3 num_bytes: 623392455.88 num_examples: 2620 - name: tremolo.4 num_bytes: 623392455.88 num_examples: 2620 - name: treble.1 num_bytes: 623392455.88 num_examples: 2620 - name: treble.2 num_bytes: 623392455.88 num_examples: 2620 - name: treble.3 num_bytes: 623392455.88 num_examples: 2620 - name: treble.4 num_bytes: 623392455.88 num_examples: 2620 - name: bass.1 num_bytes: 623392455.88 num_examples: 2620 - name: bass.2 num_bytes: 623392455.88 num_examples: 2620 - name: bass.3 num_bytes: 623392455.88 num_examples: 2620 - name: bass.4 num_bytes: 623392455.88 num_examples: 2620 - name: chorus.1 num_bytes: 626913735.88 num_examples: 2620 - name: chorus.2 num_bytes: 628590535.88 num_examples: 2620 - name: chorus.3 num_bytes: 630267335.88 num_examples: 2620 - name: chorus.4 num_bytes: 631944135.88 num_examples: 2620 - name: None.0 num_bytes: 367982506.42 num_examples: 2620 download_size: 67547733720 dataset_size: 68871044112.51988 - config_name: librispeech_asr-test.clean_pertEval_500_30 features: - name: file dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: speaker_id dtype: int64 - name: chapter_id dtype: int64 - name: id dtype: string - name: pert_idx dtype: int64 splits: - name: gnoise.1 num_bytes: 3592401090.0 num_examples: 15000 - name: env_noise_esc50.1 num_bytes: 3592401090.0 num_examples: 15000 download_size: 7170899040 dataset_size: 7184802180.0 - config_name: multilingual_librispeech-spanish_test features: - name: file dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: speaker_id dtype: int64 - name: chapter_id dtype: int64 - name: id dtype: string splits: - name: None.0 num_bytes: 596762288.01 num_examples: 2385 - name: env_noise.1 num_bytes: 1153485830.17 num_examples: 2385 - name: env_noise.2 num_bytes: 1153485830.17 num_examples: 2385 - name: env_noise.3 num_bytes: 1153485830.17 num_examples: 2385 - name: env_noise.4 num_bytes: 1153485830.17 num_examples: 2385 - name: rir.1 num_bytes: 1268493860.17 num_examples: 2385 - name: rir.2 num_bytes: 1252109860.17 num_examples: 2385 - name: rir.3 num_bytes: 1249517860.17 num_examples: 2385 - name: rir.4 num_bytes: 1222893860.17 num_examples: 2385 - name: speedup.1 num_bytes: 923001764.17 num_examples: 2385 - name: speedup.2 num_bytes: 769347364.17 num_examples: 2385 - name: speedup.3 num_bytes: 659593516.17 num_examples: 2385 - name: speedup.4 num_bytes: 577275652.17 num_examples: 2385 - name: slowdown.1 num_bytes: 1318119422.17 num_examples: 2385 - name: slowdown.2 num_bytes: 1537627530.17 num_examples: 2385 - name: slowdown.3 num_bytes: 1844938056.17 num_examples: 2385 - name: slowdown.4 num_bytes: 2305906194.17 num_examples: 2385 - name: pitch_up.3 num_bytes: 1153485830.17 num_examples: 2385 - name: pitch_up.4 num_bytes: 1153485830.17 num_examples: 2385 - name: pitch_down.1 num_bytes: 1153485830.17 num_examples: 2385 - name: pitch_down.2 num_bytes: 1153485830.17 num_examples: 2385 - name: pitch_down.3 num_bytes: 1153485830.17 num_examples: 2385 - name: pitch_down.4 num_bytes: 1153485830.17 num_examples: 2385 - name: pitch_up.1 num_bytes: 1153485821.72 num_examples: 2385 - name: pitch_up.2 num_bytes: 1153485821.72 num_examples: 2385 - name: resample.2 num_bytes: 1153485842.17 num_examples: 2385 - name: gain.1 num_bytes: 1153485830.17 num_examples: 2385 - name: gain.2 num_bytes: 1153485830.17 num_examples: 2385 - name: gain.3 num_bytes: 1153485830.17 num_examples: 2385 - name: gain.4 num_bytes: 1153485830.17 num_examples: 2385 - name: echo.1 num_bytes: 1163025830.17 num_examples: 2385 - name: echo.2 num_bytes: 1172565830.17 num_examples: 2385 - name: echo.3 num_bytes: 1191645830.17 num_examples: 2385 - name: echo.4 num_bytes: 1229805830.17 num_examples: 2385 - name: tempo_up.1 num_bytes: 923001758.17 num_examples: 2385 - name: tempo_up.2 num_bytes: 769345632.17 num_examples: 2385 - name: tempo_up.3 num_bytes: 659591372.17 num_examples: 2385 - name: tempo_up.4 num_bytes: 577275652.17 num_examples: 2385 - name: tempo_down.1 num_bytes: 1318117252.17 num_examples: 2385 - name: tempo_down.2 num_bytes: 1537626028.17 num_examples: 2385 - name: tempo_down.3 num_bytes: 1844938048.17 num_examples: 2385 - name: tempo_down.4 num_bytes: 2305906194.17 num_examples: 2385 - name: phaser.1 num_bytes: 1153485830.17 num_examples: 2385 - name: phaser.2 num_bytes: 1153485830.17 num_examples: 2385 - name: phaser.3 num_bytes: 1153485830.17 num_examples: 2385 - name: phaser.4 num_bytes: 1153485830.17 num_examples: 2385 - name: resample.1 num_bytes: 1153485840.17 num_examples: 2385 - name: resample.3 num_bytes: 1153485850.17 num_examples: 2385 - name: resample.4 num_bytes: 1153485882.17 num_examples: 2385 - name: lowpass.1 num_bytes: 1153485830.17 num_examples: 2385 - name: lowpass.2 num_bytes: 1153485830.17 num_examples: 2385 - name: lowpass.3 num_bytes: 1153485830.17 num_examples: 2385 - name: lowpass.4 num_bytes: 1153485830.17 num_examples: 2385 - name: highpass.1 num_bytes: 1153485830.17 num_examples: 2385 - name: highpass.2 num_bytes: 1153485830.17 num_examples: 2385 - name: highpass.3 num_bytes: 1153485830.17 num_examples: 2385 - name: highpass.4 num_bytes: 1153485830.17 num_examples: 2385 - name: gnoise.1 num_bytes: 1153485822.49 num_examples: 2385 - name: gnoise.2 num_bytes: 1153485822.49 num_examples: 2385 - name: gnoise.3 num_bytes: 1153485822.49 num_examples: 2385 - name: gnoise.4 num_bytes: 1153485822.49 num_examples: 2385 - name: env_noise_esc50.1 num_bytes: 1153485822.49 num_examples: 2385 - name: env_noise_esc50.2 num_bytes: 1153485822.49 num_examples: 2385 - name: env_noise_esc50.3 num_bytes: 1153485822.49 num_examples: 2385 - name: env_noise_esc50.4 num_bytes: 1153485822.49 num_examples: 2385 - name: env_noise_musan.1 num_bytes: 1153485822.49 num_examples: 2385 - name: env_noise_musan.2 num_bytes: 1153485822.49 num_examples: 2385 - name: env_noise_musan.3 num_bytes: 1153485822.49 num_examples: 2385 - name: env_noise_musan.4 num_bytes: 1153485822.49 num_examples: 2385 - name: music.1 num_bytes: 1153485822.49 num_examples: 2385 - name: music.2 num_bytes: 1153485822.49 num_examples: 2385 - name: music.3 num_bytes: 1153485822.49 num_examples: 2385 - name: music.4 num_bytes: 1153485822.49 num_examples: 2385 - name: crosstalk.1 num_bytes: 1153485822.49 num_examples: 2385 - name: crosstalk.2 num_bytes: 1153485822.49 num_examples: 2385 - name: crosstalk.3 num_bytes: 1153485822.49 num_examples: 2385 - name: crosstalk.4 num_bytes: 1153485822.49 num_examples: 2385 - name: env_noise_wham.1 num_bytes: 1153485822.49 num_examples: 2385 - name: env_noise_wham.2 num_bytes: 1153485822.49 num_examples: 2385 - name: env_noise_wham.3 num_bytes: 1153485822.49 num_examples: 2385 - name: env_noise_wham.4 num_bytes: 1153485822.49 num_examples: 2385 - name: tremolo.1 num_bytes: 1153485822.49 num_examples: 2385 - name: tremolo.2 num_bytes: 1153485822.49 num_examples: 2385 - name: tremolo.4 num_bytes: 1153485822.49 num_examples: 2385 - name: treble.1 num_bytes: 1153485822.49 num_examples: 2385 - name: treble.2 num_bytes: 1153485822.49 num_examples: 2385 - name: treble.3 num_bytes: 1153485822.49 num_examples: 2385 - name: treble.4 num_bytes: 1153485822.49 num_examples: 2385 - name: bass.1 num_bytes: 1153485822.49 num_examples: 2385 - name: bass.2 num_bytes: 1153485822.49 num_examples: 2385 - name: bass.3 num_bytes: 1153485822.49 num_examples: 2385 - name: bass.4 num_bytes: 1153485822.49 num_examples: 2385 - name: chorus.1 num_bytes: 1156691262.49 num_examples: 2385 - name: chorus.2 num_bytes: 1158217662.49 num_examples: 2385 - name: chorus.3 num_bytes: 1159744062.49 num_examples: 2385 - name: chorus.4 num_bytes: 1161270462.49 num_examples: 2385 - name: tremolo.3 num_bytes: 1153485822.49 num_examples: 2385 download_size: 117646635522 dataset_size: 113291392188.23016 - config_name: multilingual_librispeech-spanish_test_pertEval_500_30 features: - name: file dtype: string - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: speaker_id dtype: int64 - name: chapter_id dtype: int64 - name: id dtype: string - name: pert_idx dtype: int64 splits: - name: gnoise.1 num_bytes: 7341021960.0 num_examples: 15000 - name: env_noise_esc50.1 num_bytes: 7341021960.0 num_examples: 15000 download_size: 14645523867 dataset_size: 14682043920.0 - config_name: tedlium-release3_test features: - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: speaker_id dtype: string - name: gender dtype: class_label: names: '0': unknown '1': female '2': male - name: file dtype: string - name: id dtype: string splits: - name: None.0 num_bytes: 277376247.9680054 num_examples: 1155 - name: speedup.1 num_bytes: 221990159.49965963 num_examples: 1155 - name: speedup.2 num_bytes: 185066240.47311097 num_examples: 1155 - name: speedup.3 num_bytes: 158691929.4792376 num_examples: 1155 - name: slowdown.1 num_bytes: 316938966.95371 num_examples: 1155 - name: slowdown.2 num_bytes: 369687787.0762423 num_examples: 1155 - name: slowdown.3 num_bytes: 443535996.23893803 num_examples: 1155 - name: pitch_up.1 num_bytes: 277376247.9680054 num_examples: 1155 - name: pitch_up.2 num_bytes: 277376247.9680054 num_examples: 1155 - name: pitch_up.3 num_bytes: 277376247.9680054 num_examples: 1155 - name: pitch_down.1 num_bytes: 277376247.9680054 num_examples: 1155 - name: pitch_down.2 num_bytes: 277376247.9680054 num_examples: 1155 - name: pitch_down.3 num_bytes: 277376247.9680054 num_examples: 1155 - name: rir.1 num_bytes: 313788218.1586113 num_examples: 1155 - name: rir.2 num_bytes: 330268000.32334924 num_examples: 1155 - name: rir.3 num_bytes: 336608313.46153843 num_examples: 1155 - name: voice_conversion_vctk.1 num_bytes: 216990920.87134105 num_examples: 1155 - name: resample.1 num_bytes: 277376301.4329476 num_examples: 1155 - name: resample.2 num_bytes: 277376301.4329476 num_examples: 1155 - name: resample.3 num_bytes: 277376354.89788973 num_examples: 1155 - name: gain.1 num_bytes: 277376247.9680054 num_examples: 1155 - name: gain.2 num_bytes: 277376247.9680054 num_examples: 1155 - name: gain.3 num_bytes: 277376247.9680054 num_examples: 1155 - name: echo.1 num_bytes: 281996247.9680054 num_examples: 1155 - name: echo.2 num_bytes: 286616247.9680054 num_examples: 1155 - name: echo.3 num_bytes: 295856247.9680054 num_examples: 1155 - name: phaser.1 num_bytes: 277376247.9680054 num_examples: 1155 - name: phaser.2 num_bytes: 277376247.9680054 num_examples: 1155 - name: phaser.3 num_bytes: 277376247.9680054 num_examples: 1155 - name: tempo_up.1 num_bytes: 221989786.81756297 num_examples: 1155 - name: tempo_up.2 num_bytes: 185065496.68141592 num_examples: 1155 - name: tempo_up.3 num_bytes: 158690987.55275697 num_examples: 1155 - name: tempo_down.1 num_bytes: 316938020.3097345 num_examples: 1155 - name: tempo_down.2 num_bytes: 369686999.254595 num_examples: 1155 - name: tempo_down.3 num_bytes: 443535631.41933286 num_examples: 1155 - name: lowpass.1 num_bytes: 277376247.9680054 num_examples: 1155 - name: lowpass.2 num_bytes: 277376247.9680054 num_examples: 1155 - name: lowpass.3 num_bytes: 277376247.9680054 num_examples: 1155 - name: highpass.1 num_bytes: 277376247.9680054 num_examples: 1155 - name: highpass.2 num_bytes: 277376247.9680054 num_examples: 1155 - name: highpass.3 num_bytes: 277376247.9680054 num_examples: 1155 - name: speedup.4 num_bytes: 138910125.75561607 num_examples: 1155 - name: slowdown.4 num_bytes: 554308545.8577263 num_examples: 1155 - name: pitch_up.4 num_bytes: 277376247.9680054 num_examples: 1155 - name: pitch_down.4 num_bytes: 277376247.9680054 num_examples: 1155 - name: rir.4 num_bytes: 345514943.8223281 num_examples: 1155 - name: resample.4 num_bytes: 277376474.4077604 num_examples: 1155 - name: gain.4 num_bytes: 277376247.9680054 num_examples: 1155 - name: echo.4 num_bytes: 314336247.9680054 num_examples: 1155 - name: phaser.4 num_bytes: 277376247.9680054 num_examples: 1155 - name: tempo_up.4 num_bytes: 138910125.75561607 num_examples: 1155 - name: tempo_down.4 num_bytes: 554308545.8577263 num_examples: 1155 - name: lowpass.4 num_bytes: 277376247.9680054 num_examples: 1155 - name: highpass.4 num_bytes: 277376247.9680054 num_examples: 1155 - name: gnoise.1 num_bytes: 277376247.9680054 num_examples: 1155 - name: gnoise.2 num_bytes: 277376247.9680054 num_examples: 1155 - name: gnoise.3 num_bytes: 277376247.9680054 num_examples: 1155 - name: music.1 num_bytes: 301958728.16 num_examples: 1155 - name: music.2 num_bytes: 301958728.16 num_examples: 1155 - name: music.3 num_bytes: 301958728.16 num_examples: 1155 - name: music.4 num_bytes: 301958728.16 num_examples: 1155 - name: crosstalk.1 num_bytes: 301958728.16 num_examples: 1155 - name: env_noise_esc50.1 num_bytes: 277376247.9680054 num_examples: 1155 - name: env_noise_esc50.2 num_bytes: 277376247.9680054 num_examples: 1155 - name: env_noise_esc50.3 num_bytes: 277376247.9680054 num_examples: 1155 - name: gnoise.4 num_bytes: 277376247.9680054 num_examples: 1155 - name: crosstalk.2 num_bytes: 301958728.16 num_examples: 1155 - name: env_noise_esc50.4 num_bytes: 277376247.9680054 num_examples: 1155 - name: crosstalk.3 num_bytes: 301958728.16 num_examples: 1155 - name: crosstalk.4 num_bytes: 301958728.16 num_examples: 1155 - name: env_noise_musan.1 num_bytes: 301958728.16 num_examples: 1155 - name: env_noise_musan.2 num_bytes: 301958728.16 num_examples: 1155 - name: env_noise_musan.3 num_bytes: 301958728.16 num_examples: 1155 - name: env_noise_musan.4 num_bytes: 301958728.16 num_examples: 1155 - name: real_rir.1 num_bytes: 308750878.16 num_examples: 1155 - name: real_rir.2 num_bytes: 333286988.16 num_examples: 1155 - name: real_rir.3 num_bytes: 341205738.16 num_examples: 1155 - name: real_rir.4 num_bytes: 715155314.16 num_examples: 1155 - name: env_noise.1 num_bytes: 301958728.16 num_examples: 1155 - name: env_noise.2 num_bytes: 301958728.16 num_examples: 1155 - name: env_noise.3 num_bytes: 301958728.16 num_examples: 1155 - name: env_noise.4 num_bytes: 301958728.16 num_examples: 1155 - name: env_noise_wham.1 num_bytes: 301958728.16 num_examples: 1155 - name: env_noise_wham.2 num_bytes: 301958728.16 num_examples: 1155 - name: env_noise_wham.3 num_bytes: 301958728.16 num_examples: 1155 - name: env_noise_wham.4 num_bytes: 301958728.16 num_examples: 1155 - name: tremolo.1 num_bytes: 301958728.16 num_examples: 1155 - name: tremolo.2 num_bytes: 301958728.16 num_examples: 1155 - name: tremolo.3 num_bytes: 301958728.16 num_examples: 1155 - name: tremolo.4 num_bytes: 301958728.16 num_examples: 1155 - name: treble.1 num_bytes: 301958728.16 num_examples: 1155 - name: treble.2 num_bytes: 301958728.16 num_examples: 1155 - name: treble.3 num_bytes: 301958728.16 num_examples: 1155 - name: treble.4 num_bytes: 301958728.16 num_examples: 1155 - name: bass.1 num_bytes: 301958728.16 num_examples: 1155 - name: bass.2 num_bytes: 301958728.16 num_examples: 1155 - name: bass.3 num_bytes: 301958728.16 num_examples: 1155 - name: bass.4 num_bytes: 301958728.16 num_examples: 1155 - name: chorus.1 num_bytes: 303511048.16 num_examples: 1155 - name: chorus.2 num_bytes: 304250248.16 num_examples: 1155 - name: chorus.4 num_bytes: 305728648.16 num_examples: 1155 - name: chorus.3 num_bytes: 304989448.16 num_examples: 1155 download_size: 58723208514 dataset_size: 30342709961.007984 configs: - config_name: accented_cv data_files: - split: test path: accented_cv/test-* - split: test.clean path: accented_cv/test.clean-* - config_name: chime data_files: - split: farfield path: chime/farfield-* - split: nearfield path: chime/nearfield-* - config_name: in-the-wild data_files: - split: farfield path: in-the-wild/farfield-* - split: nearfield path: in-the-wild/nearfield-* - config_name: in-the-wild-AMI data_files: - split: nearfield path: in-the-wild-AMI/nearfield-* - split: farfield path: in-the-wild-AMI/farfield-* - config_name: in-the-wild-ami data_files: - split: nearfield path: in-the-wild-ami/nearfield-* - split: farfield path: in-the-wild-ami/farfield-* - config_name: librispeech_asr-test.clean data_files: - split: None.0 path: librispeech_asr-test.clean/None.0-* - split: gnoise.1 path: librispeech_asr-test.clean/gnoise.1-* - split: gnoise.2 path: librispeech_asr-test.clean/gnoise.2-* - split: gnoise.3 path: librispeech_asr-test.clean/gnoise.3-* - split: gnoise.4 path: librispeech_asr-test.clean/gnoise.4-* - split: env_noise.1 path: librispeech_asr-test.clean/env_noise.1-* - split: env_noise.2 path: librispeech_asr-test.clean/env_noise.2-* - split: env_noise.3 path: librispeech_asr-test.clean/env_noise.3-* - split: env_noise.4 path: librispeech_asr-test.clean/env_noise.4-* - split: rir.1 path: librispeech_asr-test.clean/rir.1-* - split: rir.2 path: librispeech_asr-test.clean/rir.2-* - split: rir.3 path: librispeech_asr-test.clean/rir.3-* - split: rir.4 path: librispeech_asr-test.clean/rir.4-* - split: speedup.1 path: librispeech_asr-test.clean/speedup.1-* - split: speedup.2 path: librispeech_asr-test.clean/speedup.2-* - split: speedup.3 path: librispeech_asr-test.clean/speedup.3-* - split: speedup.4 path: librispeech_asr-test.clean/speedup.4-* - split: slowdown.1 path: librispeech_asr-test.clean/slowdown.1-* - split: slowdown.2 path: librispeech_asr-test.clean/slowdown.2-* - split: slowdown.3 path: librispeech_asr-test.clean/slowdown.3-* - split: slowdown.4 path: librispeech_asr-test.clean/slowdown.4-* - split: pitch_up.3 path: librispeech_asr-test.clean/pitch_up.3-* - split: pitch_up.4 path: librispeech_asr-test.clean/pitch_up.4-* - split: pitch_down.1 path: librispeech_asr-test.clean/pitch_down.1-* - split: pitch_down.2 path: librispeech_asr-test.clean/pitch_down.2-* - split: pitch_down.3 path: librispeech_asr-test.clean/pitch_down.3-* - split: pitch_down.4 path: librispeech_asr-test.clean/pitch_down.4-* - split: pitch_up.1 path: librispeech_asr-test.clean/pitch_up.1-* - split: pitch_up.2 path: librispeech_asr-test.clean/pitch_up.2-* - split: resample.1 path: librispeech_asr-test.clean/resample.1-* - split: resample.2 path: librispeech_asr-test.clean/resample.2-* - split: resample.3 path: librispeech_asr-test.clean/resample.3-* - split: resample.4 path: librispeech_asr-test.clean/resample.4-* - split: env_noise_esc50.1 path: librispeech_asr-test.clean/env_noise_esc50.1-* - split: env_noise_esc50.2 path: librispeech_asr-test.clean/env_noise_esc50.2-* - split: env_noise_esc50.3 path: librispeech_asr-test.clean/env_noise_esc50.3-* - split: env_noise_esc50.4 path: librispeech_asr-test.clean/env_noise_esc50.4-* - split: voice_conversion.4 path: librispeech_asr-test.clean/voice_conversion.4-* - split: voice_conversion.3 path: librispeech_asr-test.clean/voice_conversion.3-* - split: voice_conversion.1 path: librispeech_asr-test.clean/voice_conversion.1-* - split: voice_conversion.2 path: librispeech_asr-test.clean/voice_conversion.2-* - split: gain.1 path: librispeech_asr-test.clean/gain.1-* - split: gain.2 path: librispeech_asr-test.clean/gain.2-* - split: gain.3 path: librispeech_asr-test.clean/gain.3-* - split: echo.1 path: librispeech_asr-test.clean/echo.1-* - split: echo.2 path: librispeech_asr-test.clean/echo.2-* - split: echo.3 path: librispeech_asr-test.clean/echo.3-* - split: echo.4 path: librispeech_asr-test.clean/echo.4-* - split: phaser.1 path: librispeech_asr-test.clean/phaser.1-* - split: phaser.2 path: librispeech_asr-test.clean/phaser.2-* - split: phaser.3 path: librispeech_asr-test.clean/phaser.3-* - split: tempo_up.1 path: librispeech_asr-test.clean/tempo_up.1-* - split: tempo_up.2 path: librispeech_asr-test.clean/tempo_up.2-* - split: tempo_up.3 path: librispeech_asr-test.clean/tempo_up.3-* - split: tempo_up.4 path: librispeech_asr-test.clean/tempo_up.4-* - split: tempo_down.1 path: librispeech_asr-test.clean/tempo_down.1-* - split: tempo_down.2 path: librispeech_asr-test.clean/tempo_down.2-* - split: tempo_down.3 path: librispeech_asr-test.clean/tempo_down.3-* - split: tempo_down.4 path: librispeech_asr-test.clean/tempo_down.4-* - split: gain.4 path: librispeech_asr-test.clean/gain.4-* - split: lowpass.1 path: librispeech_asr-test.clean/lowpass.1-* - split: lowpass.2 path: librispeech_asr-test.clean/lowpass.2-* - split: lowpass.3 path: librispeech_asr-test.clean/lowpass.3-* - split: lowpass.4 path: librispeech_asr-test.clean/lowpass.4-* - split: highpass.1 path: librispeech_asr-test.clean/highpass.1-* - split: highpass.2 path: librispeech_asr-test.clean/highpass.2-* - split: highpass.3 path: librispeech_asr-test.clean/highpass.3-* - split: highpass.4 path: librispeech_asr-test.clean/highpass.4-* - split: phaser.4 path: librispeech_asr-test.clean/phaser.4-* - split: voice_conversion_vctk.1 path: librispeech_asr-test.clean/voice_conversion_vctk.1-* - split: universal_adv.1 path: librispeech_asr-test.clean/universal_adv.1-* - split: music.1 path: librispeech_asr-test.clean/music.1-* - split: music.2 path: librispeech_asr-test.clean/music.2-* - split: music.3 path: librispeech_asr-test.clean/music.3-* - split: music.4 path: librispeech_asr-test.clean/music.4-* - split: crosstalk.1 path: librispeech_asr-test.clean/crosstalk.1-* - split: crosstalk.2 path: librispeech_asr-test.clean/crosstalk.2-* - split: crosstalk.3 path: librispeech_asr-test.clean/crosstalk.3-* - split: crosstalk.4 path: librispeech_asr-test.clean/crosstalk.4-* - split: env_noise_musan.1 path: librispeech_asr-test.clean/env_noise_musan.1-* - split: env_noise_musan.2 path: librispeech_asr-test.clean/env_noise_musan.2-* - split: env_noise_musan.3 path: librispeech_asr-test.clean/env_noise_musan.3-* - split: env_noise_musan.4 path: librispeech_asr-test.clean/env_noise_musan.4-* - split: real_rir.1 path: librispeech_asr-test.clean/real_rir.1-* - split: real_rir.2 path: librispeech_asr-test.clean/real_rir.2-* - split: real_rir.3 path: librispeech_asr-test.clean/real_rir.3-* - split: real_rir.4 path: librispeech_asr-test.clean/real_rir.4-* - split: env_noise_wham.1 path: librispeech_asr-test.clean/env_noise_wham.1-* - split: env_noise_wham.2 path: librispeech_asr-test.clean/env_noise_wham.2-* - split: env_noise_wham.3 path: librispeech_asr-test.clean/env_noise_wham.3-* - split: env_noise_wham.4 path: librispeech_asr-test.clean/env_noise_wham.4-* - split: tremolo.1 path: librispeech_asr-test.clean/tremolo.1-* - split: tremolo.2 path: librispeech_asr-test.clean/tremolo.2-* - split: tremolo.3 path: librispeech_asr-test.clean/tremolo.3-* - split: tremolo.4 path: librispeech_asr-test.clean/tremolo.4-* - split: treble.1 path: librispeech_asr-test.clean/treble.1-* - split: treble.2 path: librispeech_asr-test.clean/treble.2-* - split: treble.3 path: librispeech_asr-test.clean/treble.3-* - split: treble.4 path: librispeech_asr-test.clean/treble.4-* - split: bass.1 path: librispeech_asr-test.clean/bass.1-* - split: bass.2 path: librispeech_asr-test.clean/bass.2-* - split: bass.3 path: librispeech_asr-test.clean/bass.3-* - split: bass.4 path: librispeech_asr-test.clean/bass.4-* - split: chorus.1 path: librispeech_asr-test.clean/chorus.1-* - split: chorus.2 path: librispeech_asr-test.clean/chorus.2-* - split: chorus.3 path: librispeech_asr-test.clean/chorus.3-* - split: chorus.4 path: librispeech_asr-test.clean/chorus.4-* - config_name: librispeech_asr-test.clean_pertEval_500_30 data_files: - split: gnoise.1 path: librispeech_asr-test.clean_pertEval_500_30/gnoise.1-* - split: env_noise_esc50.1 path: librispeech_asr-test.clean_pertEval_500_30/env_noise_esc50.1-* - config_name: multilingual_librispeech-spanish_test data_files: - split: None.0 path: multilingual_librispeech-spanish_test/None.0-* - split: gnoise.1 path: multilingual_librispeech-spanish_test/gnoise.1-* - split: gnoise.2 path: multilingual_librispeech-spanish_test/gnoise.2-* - split: gnoise.3 path: multilingual_librispeech-spanish_test/gnoise.3-* - split: gnoise.4 path: multilingual_librispeech-spanish_test/gnoise.4-* - split: env_noise.1 path: multilingual_librispeech-spanish_test/env_noise.1-* - split: env_noise.2 path: multilingual_librispeech-spanish_test/env_noise.2-* - split: env_noise.3 path: multilingual_librispeech-spanish_test/env_noise.3-* - split: env_noise.4 path: multilingual_librispeech-spanish_test/env_noise.4-* - split: rir.1 path: multilingual_librispeech-spanish_test/rir.1-* - split: rir.2 path: multilingual_librispeech-spanish_test/rir.2-* - split: rir.3 path: multilingual_librispeech-spanish_test/rir.3-* - split: rir.4 path: multilingual_librispeech-spanish_test/rir.4-* - split: speedup.1 path: multilingual_librispeech-spanish_test/speedup.1-* - split: speedup.2 path: multilingual_librispeech-spanish_test/speedup.2-* - split: speedup.3 path: multilingual_librispeech-spanish_test/speedup.3-* - split: speedup.4 path: multilingual_librispeech-spanish_test/speedup.4-* - split: slowdown.1 path: multilingual_librispeech-spanish_test/slowdown.1-* - split: slowdown.2 path: multilingual_librispeech-spanish_test/slowdown.2-* - split: slowdown.3 path: multilingual_librispeech-spanish_test/slowdown.3-* - split: slowdown.4 path: multilingual_librispeech-spanish_test/slowdown.4-* - split: pitch_up.3 path: multilingual_librispeech-spanish_test/pitch_up.3-* - split: pitch_up.4 path: multilingual_librispeech-spanish_test/pitch_up.4-* - split: pitch_down.1 path: multilingual_librispeech-spanish_test/pitch_down.1-* - split: pitch_down.2 path: multilingual_librispeech-spanish_test/pitch_down.2-* - split: pitch_down.3 path: multilingual_librispeech-spanish_test/pitch_down.3-* - split: pitch_down.4 path: multilingual_librispeech-spanish_test/pitch_down.4-* - split: pitch_up.1 path: multilingual_librispeech-spanish_test/pitch_up.1-* - split: pitch_up.2 path: multilingual_librispeech-spanish_test/pitch_up.2-* - split: resample.2 path: multilingual_librispeech-spanish_test/resample.2-* - split: resample.3 path: multilingual_librispeech-spanish_test/resample.3-* - split: resample.4 path: multilingual_librispeech-spanish_test/resample.4-* - split: env_noise_esc50.1 path: multilingual_librispeech-spanish_test/env_noise_esc50.1-* - split: env_noise_esc50.2 path: multilingual_librispeech-spanish_test/env_noise_esc50.2-* - split: env_noise_esc50.3 path: multilingual_librispeech-spanish_test/env_noise_esc50.3-* - split: env_noise_esc50.4 path: multilingual_librispeech-spanish_test/env_noise_esc50.4-* - split: resample.1 path: multilingual_librispeech-spanish_test/resample.1-* - split: gain.1 path: multilingual_librispeech-spanish_test/gain.1-* - split: gain.2 path: multilingual_librispeech-spanish_test/gain.2-* - split: gain.3 path: multilingual_librispeech-spanish_test/gain.3-* - split: gain.4 path: multilingual_librispeech-spanish_test/gain.4-* - split: echo.4 path: multilingual_librispeech-spanish_test/echo.4-* - split: echo.1 path: multilingual_librispeech-spanish_test/echo.1-* - split: echo.2 path: multilingual_librispeech-spanish_test/echo.2-* - split: echo.3 path: multilingual_librispeech-spanish_test/echo.3-* - split: tempo_up.1 path: multilingual_librispeech-spanish_test/tempo_up.1-* - split: tempo_up.2 path: multilingual_librispeech-spanish_test/tempo_up.2-* - split: tempo_up.3 path: multilingual_librispeech-spanish_test/tempo_up.3-* - split: tempo_up.4 path: multilingual_librispeech-spanish_test/tempo_up.4-* - split: tempo_down.1 path: multilingual_librispeech-spanish_test/tempo_down.1-* - split: tempo_down.2 path: multilingual_librispeech-spanish_test/tempo_down.2-* - split: tempo_down.3 path: multilingual_librispeech-spanish_test/tempo_down.3-* - split: tempo_down.4 path: multilingual_librispeech-spanish_test/tempo_down.4-* - split: lowpass.1 path: multilingual_librispeech-spanish_test/lowpass.1-* - split: lowpass.2 path: multilingual_librispeech-spanish_test/lowpass.2-* - split: lowpass.3 path: multilingual_librispeech-spanish_test/lowpass.3-* - split: lowpass.4 path: multilingual_librispeech-spanish_test/lowpass.4-* - split: highpass.1 path: multilingual_librispeech-spanish_test/highpass.1-* - split: highpass.2 path: multilingual_librispeech-spanish_test/highpass.2-* - split: highpass.3 path: multilingual_librispeech-spanish_test/highpass.3-* - split: highpass.4 path: multilingual_librispeech-spanish_test/highpass.4-* - split: phaser.1 path: multilingual_librispeech-spanish_test/phaser.1-* - split: phaser.2 path: multilingual_librispeech-spanish_test/phaser.2-* - split: phaser.3 path: multilingual_librispeech-spanish_test/phaser.3-* - split: phaser.4 path: multilingual_librispeech-spanish_test/phaser.4-* - split: env_noise_musan.1 path: multilingual_librispeech-spanish_test/env_noise_musan.1-* - split: env_noise_musan.2 path: multilingual_librispeech-spanish_test/env_noise_musan.2-* - split: env_noise_musan.3 path: multilingual_librispeech-spanish_test/env_noise_musan.3-* - split: env_noise_musan.4 path: multilingual_librispeech-spanish_test/env_noise_musan.4-* - split: music.1 path: multilingual_librispeech-spanish_test/music.1-* - split: music.2 path: multilingual_librispeech-spanish_test/music.2-* - split: music.3 path: multilingual_librispeech-spanish_test/music.3-* - split: music.4 path: multilingual_librispeech-spanish_test/music.4-* - split: crosstalk.1 path: multilingual_librispeech-spanish_test/crosstalk.1-* - split: crosstalk.2 path: multilingual_librispeech-spanish_test/crosstalk.2-* - split: crosstalk.3 path: multilingual_librispeech-spanish_test/crosstalk.3-* - split: crosstalk.4 path: multilingual_librispeech-spanish_test/crosstalk.4-* - split: env_noise_wham.1 path: multilingual_librispeech-spanish_test/env_noise_wham.1-* - split: env_noise_wham.2 path: multilingual_librispeech-spanish_test/env_noise_wham.2-* - split: env_noise_wham.3 path: multilingual_librispeech-spanish_test/env_noise_wham.3-* - split: env_noise_wham.4 path: multilingual_librispeech-spanish_test/env_noise_wham.4-* - split: tremolo.1 path: multilingual_librispeech-spanish_test/tremolo.1-* - split: tremolo.2 path: multilingual_librispeech-spanish_test/tremolo.2-* - split: tremolo.4 path: multilingual_librispeech-spanish_test/tremolo.4-* - split: treble.1 path: multilingual_librispeech-spanish_test/treble.1-* - split: treble.2 path: multilingual_librispeech-spanish_test/treble.2-* - split: treble.3 path: multilingual_librispeech-spanish_test/treble.3-* - split: treble.4 path: multilingual_librispeech-spanish_test/treble.4-* - split: bass.1 path: multilingual_librispeech-spanish_test/bass.1-* - split: bass.2 path: multilingual_librispeech-spanish_test/bass.2-* - split: bass.3 path: multilingual_librispeech-spanish_test/bass.3-* - split: bass.4 path: multilingual_librispeech-spanish_test/bass.4-* - split: chorus.1 path: multilingual_librispeech-spanish_test/chorus.1-* - split: chorus.2 path: multilingual_librispeech-spanish_test/chorus.2-* - split: chorus.3 path: multilingual_librispeech-spanish_test/chorus.3-* - split: chorus.4 path: multilingual_librispeech-spanish_test/chorus.4-* - split: tremolo.3 path: multilingual_librispeech-spanish_test/tremolo.3-* - config_name: multilingual_librispeech-spanish_test_pertEval_500_30 data_files: - split: gnoise.1 path: multilingual_librispeech-spanish_test_pertEval_500_30/gnoise.1-* - split: env_noise_esc50.1 path: multilingual_librispeech-spanish_test_pertEval_500_30/env_noise_esc50.1-* - config_name: tedlium-release3_test data_files: - split: gnoise.1 path: tedlium-release3_test/gnoise.1-* - split: gnoise.2 path: tedlium-release3_test/gnoise.2-* - split: gnoise.3 path: tedlium-release3_test/gnoise.3-* - split: env_noise_esc50.1 path: tedlium-release3_test/env_noise_esc50.1-* - split: env_noise_esc50.2 path: tedlium-release3_test/env_noise_esc50.2-* - split: env_noise_esc50.3 path: tedlium-release3_test/env_noise_esc50.3-* - split: speedup.1 path: tedlium-release3_test/speedup.1-* - split: speedup.2 path: tedlium-release3_test/speedup.2-* - split: speedup.3 path: tedlium-release3_test/speedup.3-* - split: slowdown.1 path: tedlium-release3_test/slowdown.1-* - split: slowdown.2 path: tedlium-release3_test/slowdown.2-* - split: slowdown.3 path: tedlium-release3_test/slowdown.3-* - split: pitch_up.1 path: tedlium-release3_test/pitch_up.1-* - split: pitch_up.2 path: tedlium-release3_test/pitch_up.2-* - split: pitch_up.3 path: tedlium-release3_test/pitch_up.3-* - split: pitch_down.1 path: tedlium-release3_test/pitch_down.1-* - split: pitch_down.2 path: tedlium-release3_test/pitch_down.2-* - split: pitch_down.3 path: tedlium-release3_test/pitch_down.3-* - split: rir.1 path: tedlium-release3_test/rir.1-* - split: rir.2 path: tedlium-release3_test/rir.2-* - split: rir.3 path: tedlium-release3_test/rir.3-* - split: voice_conversion_vctk.1 path: tedlium-release3_test/voice_conversion_vctk.1-* - split: resample.1 path: tedlium-release3_test/resample.1-* - split: resample.2 path: tedlium-release3_test/resample.2-* - split: resample.3 path: tedlium-release3_test/resample.3-* - split: gain.1 path: tedlium-release3_test/gain.1-* - split: gain.2 path: tedlium-release3_test/gain.2-* - split: gain.3 path: tedlium-release3_test/gain.3-* - split: echo.1 path: tedlium-release3_test/echo.1-* - split: echo.2 path: tedlium-release3_test/echo.2-* - split: echo.3 path: tedlium-release3_test/echo.3-* - split: phaser.1 path: tedlium-release3_test/phaser.1-* - split: phaser.2 path: tedlium-release3_test/phaser.2-* - split: phaser.3 path: tedlium-release3_test/phaser.3-* - split: tempo_up.1 path: tedlium-release3_test/tempo_up.1-* - split: tempo_up.2 path: tedlium-release3_test/tempo_up.2-* - split: tempo_up.3 path: tedlium-release3_test/tempo_up.3-* - split: tempo_down.1 path: tedlium-release3_test/tempo_down.1-* - split: tempo_down.2 path: tedlium-release3_test/tempo_down.2-* - split: tempo_down.3 path: tedlium-release3_test/tempo_down.3-* - split: lowpass.1 path: tedlium-release3_test/lowpass.1-* - split: lowpass.2 path: tedlium-release3_test/lowpass.2-* - split: lowpass.3 path: tedlium-release3_test/lowpass.3-* - split: highpass.1 path: tedlium-release3_test/highpass.1-* - split: highpass.2 path: tedlium-release3_test/highpass.2-* - split: highpass.3 path: tedlium-release3_test/highpass.3-* - split: gnoise.4 path: tedlium-release3_test/gnoise.4-* - split: env_noise_esc50.4 path: tedlium-release3_test/env_noise_esc50.4-* - split: speedup.4 path: tedlium-release3_test/speedup.4-* - split: slowdown.4 path: tedlium-release3_test/slowdown.4-* - split: pitch_up.4 path: tedlium-release3_test/pitch_up.4-* - split: pitch_down.4 path: tedlium-release3_test/pitch_down.4-* - split: rir.4 path: tedlium-release3_test/rir.4-* - split: resample.4 path: tedlium-release3_test/resample.4-* - split: gain.4 path: tedlium-release3_test/gain.4-* - split: echo.4 path: tedlium-release3_test/echo.4-* - split: phaser.4 path: tedlium-release3_test/phaser.4-* - split: tempo_up.4 path: tedlium-release3_test/tempo_up.4-* - split: tempo_down.4 path: tedlium-release3_test/tempo_down.4-* - split: lowpass.4 path: tedlium-release3_test/lowpass.4-* - split: highpass.4 path: tedlium-release3_test/highpass.4-* - split: None.0 path: tedlium-release3_test/None.0-* - split: music.1 path: tedlium-release3_test/music.1-* - split: music.2 path: tedlium-release3_test/music.2-* - split: music.3 path: tedlium-release3_test/music.3-* - split: music.4 path: tedlium-release3_test/music.4-* - split: crosstalk.1 path: tedlium-release3_test/crosstalk.1-* - split: crosstalk.2 path: tedlium-release3_test/crosstalk.2-* - split: crosstalk.3 path: tedlium-release3_test/crosstalk.3-* - split: crosstalk.4 path: tedlium-release3_test/crosstalk.4-* - split: env_noise_musan.1 path: tedlium-release3_test/env_noise_musan.1-* - split: env_noise_musan.2 path: tedlium-release3_test/env_noise_musan.2-* - split: env_noise_musan.3 path: tedlium-release3_test/env_noise_musan.3-* - split: env_noise_musan.4 path: tedlium-release3_test/env_noise_musan.4-* - split: real_rir.1 path: tedlium-release3_test/real_rir.1-* - split: real_rir.2 path: tedlium-release3_test/real_rir.2-* - split: real_rir.3 path: tedlium-release3_test/real_rir.3-* - split: real_rir.4 path: tedlium-release3_test/real_rir.4-* - split: env_noise.1 path: tedlium-release3_test/env_noise.1-* - split: env_noise.2 path: tedlium-release3_test/env_noise.2-* - split: env_noise.3 path: tedlium-release3_test/env_noise.3-* - split: env_noise.4 path: tedlium-release3_test/env_noise.4-* - split: env_noise_wham.1 path: tedlium-release3_test/env_noise_wham.1-* - split: env_noise_wham.2 path: tedlium-release3_test/env_noise_wham.2-* - split: env_noise_wham.3 path: tedlium-release3_test/env_noise_wham.3-* - split: env_noise_wham.4 path: tedlium-release3_test/env_noise_wham.4-* - split: tremolo.1 path: tedlium-release3_test/tremolo.1-* - split: tremolo.2 path: tedlium-release3_test/tremolo.2-* - split: tremolo.3 path: tedlium-release3_test/tremolo.3-* - split: tremolo.4 path: tedlium-release3_test/tremolo.4-* - split: treble.1 path: tedlium-release3_test/treble.1-* - split: treble.2 path: tedlium-release3_test/treble.2-* - split: treble.3 path: tedlium-release3_test/treble.3-* - split: treble.4 path: tedlium-release3_test/treble.4-* - split: bass.1 path: tedlium-release3_test/bass.1-* - split: bass.2 path: tedlium-release3_test/bass.2-* - split: bass.3 path: tedlium-release3_test/bass.3-* - split: bass.4 path: tedlium-release3_test/bass.4-* - split: chorus.1 path: tedlium-release3_test/chorus.1-* - split: chorus.2 path: tedlium-release3_test/chorus.2-* - split: chorus.4 path: tedlium-release3_test/chorus.4-* - split: chorus.3 path: tedlium-release3_test/chorus.3-* --- # Dataset Card for "speech_robust_bench" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

提供机构：

mshah1

原始信息汇总

数据集概述

数据集配置 `accented_cv`

特征:
- audio: 采样率 16000 Hz
- text: 字符串
- age: 字符串
- gender: 字符串
- accents: 字符串
- locale: 字符串
- id: 整数 (int64)
分割:
- test: 55407854.085 字节, 1355 个样本
- test.clean: 25593824.0 字节, 640 个样本
下载大小: 78598662 字节
数据集大小: 81001678.08500001 字节

数据集配置 `chime`

特征:
- audio: 音频
- end_time: 字符串
- start_time: 字符串
- speaker: 字符串
- ref: 字符串
- location: 字符串
- session_id: 字符串
- text: 字符串
分割:
- farfield: 521160936.31 字节, 6535 个样本
- nearfield: 1072274621.0799999 字节, 6535 个样本
下载大小: 1532887016 字节
数据集大小: 1593435557.3899999 字节

数据集配置 `in-the-wild`

特征:
- audio: 音频
- end_time: 字符串
- start_time: 字符串
- speaker: 字符串
- ref: 字符串
- location: 字符串
- session_id: 字符串
- id: 字符串
- text: 字符串
分割:
- farfield: 521363521.31 字节, 6535 个样本
- nearfield: 1072477206.0799999 字节, 6535 个样本
下载大小: 1533124839 字节
数据集大小: 1593840727.3899999 字节

数据集配置 `in-the-wild-AMI`

特征:
- meeting_id: 字符串
- id: 字符串
- text: 字符串
- audio: 采样率 16000 Hz
- begin_time: 浮点数 (float32)
- end_time: 浮点数 (float32)
- microphone_id: 字符串
- speaker_id: 字符串
分割:
- nearfield: 1382749390.9785259 字节, 6584 个样本
- farfield: 1040706691.1008185 字节, 6584 个样本
下载大小: 2164898498 字节
数据集大小: 2423456082.0793443 字节

数据集配置 `in-the-wild-ami`

特征:
- meeting_id: 字符串
- audio_id: 字符串
- text: 字符串
- audio: 采样率 16000 Hz
- begin_time: 浮点数 (float32)
- end_time: 浮点数 (float32)
- microphone_id: 字符串
- speaker_id: 字符串
分割:
- nearfield: 1382749390.9785259 字节, 6584 个样本
- farfield: 1040706691.1008185 字节, 6584 个样本
下载大小: 2164900274 字节
数据集大小: 2423456082.0793443 字节

数据集配置 `librispeech_asr-test.clean`

特征:
- file: 字符串
- audio: 采样率 16000 Hz
- text: 字符串
- speaker_id: 整数 (int64)
- chapter_id: 整数 (int64)
- id: 字符串
分割:
- speedup.1: 498896619.34 字节, 2620 个样本
- speedup.2: 415901075.34 字节, 2620 个样本
- speedup.3: 356617835.34 字节, 2620 个样本
- speedup.4: 312152811.34 字节, 2620 个样本
- slowdown.1: 712320343.34 字节, 2620 个样本
- slowdown.2: 830887339.34 字节, 2620 个样本
- slowdown.3: 996880127.34 字节, 2620 个样本
- slowdown.4: 1245871847.34 字节, 2620 个样本
- pitch_up.3: 623392467.34 字节, 2620 个样本
- pitch_up.4: 623392467.34 字节, 2620 个样本
- pitch_down.1: 623392467.34 字节, 2620 个样本
- pitch_down.2: 623392467.34 字节, 2620 个样本
- pitch_down.3: 623392467.34 字节, 2620 个样本
- pitch_down.4: 623392467.34 字节, 2620 个样本
- pitch_up.1: 623392458.5 字节, 2620 个样本
- pitch_up.2: 623392458.5 字节, 2620 个样本
- resample.1: 623392535.34 字节, 2620 个样本
- resample.2: 623392535.34 字节, 2620 个样本
- resample.3: 623392579.34 字节, 2620 个样本
- resample.4: 623392623.34 字节, 2620 个样本
- voice_conversion.4: 799852214.5 字节, 2620 个样本
- voice_conversion.3: 580185782.5 字节, 2620 个样本
- voice_conversion.1: 589259446.5 字节, 2620 个样本
- voice_conversion.2: 571175606.5 字节, 2620 个样本
- gain.1: 623392467.34 字节, 2620 个样本
- gain.2: 623392467.34 字节, 2620 个样本
- gain.3: 623392467.34 字节, 2620 个样本
- echo.1: 633872467.34 字节, 2620 个样本
- echo.2: 644352467.34 字节, 2620 个样本
- echo.3: 665312467.34 字节, 2620 个样本
- echo.4: 707232467.34 字节, 2620 个样本
- phaser.1: 623392467.34 字节, 2620 个样本
- phaser.2: 623392467.34 字节, 2620 个样本
- phaser.3: 623392467.34 字节, 2620 个样本
- tempo_up.1: 498896595.34 字节, 2620 个样本
- tempo_up.2: 415899351.34 字节, 2620 个样本
- tempo_up.3: 356615595.34 字节, 2620 个样本
- tempo_up.4: 312152811.34 字节, 2620 个样本
- tempo_down.1: 712318083.34 字节, 2620 个样本
- tempo_down.2: 830885583.34 字节, 2620 个样本
- tempo_down.3: 996880103.34 字节, 2620 个样本
- tempo_down.4: 1245871847.34 字节, 2620 个样本
- gain.4: 623392467.34 字节, 2620 个样本
- phaser.4: 623392467.34 字节, 2620 个样本
- lowpass.1: 623392467.34 字节, 2620 个样本
- lowpass.2: 623392467.34 字节, 2620 个样本
- lowpass.3: 623392467.34 字节, 2620 个样本
- lowpass.4: 623392467.34 字节, 2620 个样本
- highpass.1: 623392467.34 字节, 2620 个样本
- highpass.2: 623392467.34 字节, 2620 个样本
- highpass.3: 623392467.34 字节, 2620 个样本
- highpass.4: 623392467.34 字节, 2620 个样本
- voice_conversion_vctk.1: 495165825.88 字节, 2620 个样本
- universal_adv.1: 623392467.34 字节, 2620 个样本
- rir.1: 705636818.5 字节, 2620 个样本
- rir.2: 744484818.5 字节, 2620 个样本
- rir.3: 758740818.5 字节, 2620 个样本
- rir.4: 776116818.5 字节, 2620 个样本
- gnoise.1: 623392455.88 字节, 2620 个样本
- gnoise.2: 623392455.88 字节, 2620 个样本
- gnoise.3: 623392455.88 字节, 2620 个样本
- gnoise.4: 623392455.88 字节, 2620 个样本
- env_noise_esc50.1: 623392455.88 字节, 2620 个样本
- env_noise_esc50.2: 623392455.88 字节, 2620 个样本
- env_noise_esc50.3: 623392455.88 字节, 2620 个样本
- env_noise_esc50.4: 623392455.88 字节, 2620 个样本
- music.1: 623392455.88 字节, 2620 个样本
- music.2: 623392455.88 字节, 2620 个样本
- music.3: 623392455.88 字节, 2620 个样本
- music.4: 623392455.88 字节, 2620 个样本
- crosstalk.1: 623392455.88 字节, 2620 个样本
- crosstalk.2: 623392455.88 字节, 2620 个样本
- crosstalk.3: 623392455.88 字节, 2620 个样本
- crosstalk.4: 623392455.88 字节, 2620 个样本
- env_noise_musan.1: 623392455.88 字节, 2620 个样本
- env_noise_musan.2: 623392455.88 字节, 2620 个样本
- env_noise_musan.3: 623392455.88 字节, 2620 个样本
- env_noise_musan.4: 623392455.88 字节, 2620 个样本
- real_rir.1: 638169615.88 字节, 2620 个样本
- real_rir.2: 694281819.88 字节, 2620 个样本
- real_rir.3: 713

搜集汇总

数据集介绍

构建方式

在语音识别领域，评估模型在多样化声学环境下的鲁棒性至关重要。Speech Robust Bench数据集通过整合多个知名语音语料库，如LibriSpeech、Common Voice以及CHiME等，构建了一个综合性的测试平台。其构建过程涉及对原始音频数据施加系统性的声学扰动，包括语速变化、音高调整、环境噪声叠加、混响模拟等多种失真类型，每种扰动均以不同强度等级进行配置，从而生成具有可控退化程度的测试样本。这种分层扰动策略确保了数据集能够全面覆盖现实世界中可能遇到的声学变异场景。

特点

该数据集的核心特征在于其精心设计的扰动维度和丰富的元数据标注。它不仅提供了涵盖语音速度、音高、背景噪声、房间脉冲响应等数十种声学变异的子集，还包含了说话人年龄、性别、口音以及录音环境等详尽的上下文信息。每个音频样本均以16kHz采样率存储，并配有精确的文本转录，使得数据集能够支持对语音识别系统在多种退化条件下的性能进行细粒度分析。这种多维度的特征设计为深入研究模型鲁棒性的失效模式提供了坚实基础。

使用方法

研究人员可利用该数据集对自动语音识别模型进行系统性基准测试。通过加载特定的配置名称，如`accented_cv`或`librispeech_asr-test.clean`，可以访问对应的原始或扰动后的测试分割。典型的使用流程包括使用Hugging Face `datasets`库加载数据，随后将音频特征与文本标签输入模型进行推理，并计算词错误率等评价指标。数据集的结构化设计允许用户轻松比较模型在不同扰动类型和强度下的性能表现，从而识别模型的薄弱环节并指导后续的鲁棒性增强研究。

背景与挑战

背景概述

在自动语音识别技术日益成熟的背景下，模型在实验室环境下虽表现卓越，但在真实世界的复杂声学环境中其鲁棒性仍面临严峻考验。Speech Robustness Benchmark数据集由研究人员mshah1构建，旨在系统评估语音识别系统在多样化声学扰动下的性能表现。该数据集整合了多个经典语音语料库，如LibriSpeech、CHiME及多语言变体，并引入了广泛的声学扰动类型，包括背景噪声、混响、口音变异及信号处理失真等，为语音识别模型的鲁棒性评估提供了标准化测试平台。其核心研究问题聚焦于如何量化模型在非理想条件下的泛化能力，推动了语音技术从实验室向实际应用场景的过渡，对提升语音系统的实用性与可靠性具有深远影响。

当前挑战

该数据集致力于解决自动语音识别在复杂声学环境中的鲁棒性挑战，具体包括模型对背景噪声、口音变异、混响及多种音频失真的敏感性问题。在构建过程中，挑战主要体现在声学扰动的系统化生成与标注上，例如需要精确模拟真实环境中的噪声叠加、多语言口音的数据采集与平衡，以及确保各种扰动参数（如信噪比、混响时间）的合理覆盖与一致性。此外，整合不同来源的原始语料库并保持数据格式与标注的统一性，也是一项复杂的工程任务。这些挑战共同指向了构建一个全面、可靠且可复现的语音鲁棒性评估基准的难度。

常用场景

经典使用场景

在语音识别领域，robustness benchmark数据集常被用于评估模型在多样化声学环境下的性能表现。该数据集通过整合多种口音、噪声干扰及声学变换，模拟了真实世界中的复杂语音场景，为研究者提供了一个标准化的测试平台，以系统性地检验自动语音识别系统的鲁棒性。

衍生相关工作

围绕该数据集，已衍生出多项经典研究工作，包括基于对抗训练的鲁棒性增强方法、多任务学习框架以及域泛化策略。这些工作不仅深化了对语音识别鲁棒性机制的理解，还催生了如噪声不变特征提取、自适应声学建模等创新技术，持续推动着语音处理领域的进步。

数据集最近研究

mshah1/speech_robust_bench

数据集概述

数据集配置 accented_cv

数据集配置 chime

数据集配置 in-the-wild

数据集配置 in-the-wild-AMI

数据集配置 in-the-wild-ami