WHAM 噪音数据集

Name: WHAM 噪音数据集
Creator: 帕依提提
License: 暂无描述

帕依提提2024-03-04 收录

下载链接：

https://www.payititi.com/opendatasets/show-1820.html

下载链接

链接失效反馈

官方服务：

资源简介：

The WSJ0 Hipster Ambient Mixtures (WHAM!) dataset pairs each two-speaker mixture in the wsj0-2mix dataset with a unique noise background scene. We also created WHAMR!, an extension that adds artificial reverberation to the speech signals in addition to the background noise. The noise audio was collected at various urban locations throughout the San Francisco Bay Area in late 2018. The environments primarily consist of restaurants, cafes, bars, and parks. Audio was recorded using an Apogee Sennheiser binaural microphone on a tripod between 1.0 and 1.5 meters off the ground. The set of noise samples, referred to as "WHAM! noise dataset", is provided here, along with the scripts to build the WHAM! and WHAMR! datasets from the noise data and the WSJ0 dataset. We also provide the "WHAM!48kHz noise dataset", consisting of the noise recordings at their original sample rate and without segmenting the clips to the duration of WSJ0 clips. Both the WHAM! noise dataset and the WHAM!48kHz noise dataset have been processed to remove any segments containing intelligible speech. Because the WHAM!48kHz noise dataset has not been further segmented to the duration of WSJ0 clips, it has a wide distribution of clip durations. This work is further described in our papers "WHAM!: Extending Speech Separation to Noisy Environments." and "WHAMR!: Noisy and Reverberant Single-Channel Speech Separation." The WHAM! noise dataset and variants, along with relevant data generation scripts are available for download: The WHAM! dataset is built by mixing 2-speaker mixtures from the wsj0-2mix dataset with noise samples from the WHAM! noise dataset. Only the noise data is provided here, and users will need access (and license) to the WSJ0 dataset. For WHAM!: Please refer to the README for detailed instructions on how to use the mixing scripts, which can be downloaded using the link above. For WHAMR!: Please refer to the README for detailed instructions on how to use the mixing scripts, which can be downloaded using the link above. The WHAM! noise dataset is split into training, validation, and test sets following the wsj0-2mix dataset. The clips are in 32-bit floating point WAV format with 2 channels and a sampling rate of 16 kHz. The average clip duration is 10 seconds with the shortest clip being 3.5 seconds and the longest 47.7 seconds. WHAM! is a joint effort between Mitsubishi Electronics Research Laboratories (MERL) and Whisper. If you use WHAM! or WHAM!48kHz please cite our paper describing the dataset: If you use WHAMR! please cite our paper describing the dataset:

WSJ0时尚达人环境混合（WHAM!）数据集将wsj0-2mix数据集中的每一段双说话人混合音频，与一段独特的噪声背景场景进行配对。我们还推出了其扩展版本WHAMR!，该版本除背景噪声外，还为语音信号添加了人工混响效果。该数据集的噪声音频于2018年末在旧金山湾区各处城市点位采集，采集环境主要包括餐厅、咖啡馆、酒吧与公园。录音采用安装于三脚架上的Apogee Sennheiser双耳麦克风，录制高度距地面1.0至1.5米。此处提供的噪声样本集被称为"WHAM!噪声数据集"，同时附带了基于该噪声数据与WSJ0数据集构建WHAM!与WHAMR!数据集的脚本文件。我们还提供了"WHAM!48kHz噪声数据集"，该数据集保留了噪声录音的原始采样率，且未将音频片段裁剪至WSJ0音频片段的时长。WHAM!噪声数据集与WHAM!48kHz噪声数据集均经过处理，移除了所有包含可懂语音的片段。由于WHAM!48kHz噪声数据集未进一步裁剪至WSJ0音频片段的时长，其音频片段的时长分布范围较广。本数据集的详细信息可参考我们发表的两篇论文：《WHAM!: 将语音分离拓展至噪声环境》（WHAM!: Extending Speech Separation to Noisy Environments）与《WHAMR!: 带噪声与混响的单通道语音分离》（WHAMR!: Noisy and Reverberant Single-Channel Speech Separation）。WHAM!噪声数据集及其变体，以及相关的数据生成脚本均可下载：WHAM!数据集通过将wsj0-2mix数据集中的双说话人混合音频，与WHAM!噪声数据集中的噪声样本进行混合而构建。此处仅提供噪声数据，用户需自行获取WSJ0数据集的访问权限及相关授权。关于WHAM!：请参阅README文件以获取混合脚本的详细使用说明，该脚本可通过上述链接下载。关于WHAMR!：请参阅README文件以获取混合脚本的详细使用说明，该脚本可通过上述链接下载。WHAM!噪声数据集按照wsj0-2mix数据集的划分方式，分为训练集、验证集与测试集。音频片段采用32位浮点WAV格式，拥有2个声道，采样率为16kHz。音频片段的平均时长为10秒，最短时长3.5秒，最长时长47.7秒。WHAM!数据集是三菱电子研究实验室（MERL）与Whisper合作的成果。若您在研究中使用WHAM!或WHAM!48kHz数据集，请引用本数据集对应的论文：若您在研究中使用WHAMR!数据集，请引用本数据集对应的论文：

提供机构：

帕依提提

搜集汇总

数据集介绍

背景与挑战

背景概述

WHAM噪音数据集是一个用于语音分离任务的噪音背景数据集，包含从旧金山湾区多个城市地点收集的噪音录音。数据集分为训练、验证和测试集，与wsj0-2mix数据集结构匹配，并提供WHAMR!扩展版本增加混响效果。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集