TUT Sound Events 2018 - Ambisonic, Reverberant and Real-life Impulse Response Dataset
收藏Mendeley Data2024-03-27 更新2024-06-28 收录
下载链接:
https://zenodo.org/record/1237793
下载链接
链接失效反馈官方服务:
资源简介:
Tampere University of Technology (TUT) Sound Events 2018 - Ambisonic, Reverberant and Real-life Impulse Response Dataset This dataset consists of real-life first order Ambisonic (FOA) format recordings with stationary point sources each associated with a spatial coordinate. The dataset was generated by collecting impulse responses (IR) from a real environment using the Eigenmike spherical microphone array. The measurement was done by slowly moving a Genelec G Two loudspeaker continuously playing a maximum length sequence around the array in circular trajectory in one elevation at a time. The playback volume was set to be 30 dB greater than the ambient sound level. The recording was done in a corridor inside the university with classrooms around it during work hours.The IRs were collected at elevations −40 to 40 with 10-degree increments at 1 m from the Eigenmike and at elevations −20 to 20 with 10-degree increments at 2 m. The dataset consists of three sub-datasets with a) maximum one temporally overlapping sound events, b) maximum two temporally overlapping sound events, and c) maximum three temporally overlapping sound events. Each of the sub-datasets has three cross-validation splits, that consists of 240 recordings of about 30 seconds long for training split and 60 recordings of the same length for the testing split. For each recording, the metadata file with the same name consists of the sound event name, the temporal onset and offset time (in seconds), spatial location in azimuth and elevation angles (in degrees), and distance from the microphone (in meters). The isolated sound events were taken from the urbansound8k dataset. This dataset consists of 10 sound event classes such as air_conditioner, car_horn, children_playing, dog_bark, drilling, enginge_idling, gun_shot, jackhammer, siren, and street_music. We do not consider the air_conditioner and children_playing sound events. Further, we only include the sound event examples marked as foreground in the dataset. We used the splits 1, 8 and 9 provided in the urbansound8k as the three CV splits. These splits were chosen as they had a good number of examples for all the chosen sound event classes after selecting only the foreground examples. During the sound scene synthesis, we randomly chose a sound event example and associated it with a random distance among the collected ones, azimuth and elevation angle. The sound event example was then convolved with the respective IR for the given distance, azimuth and elevation to spatially position it. The metadata.zip folder consists of the license and the metadata for the complete dataset. The rest of the nine zip files consists dataset for given split and overlap. For example, the wav_ov3_split1_30db.zip file consists of training and testing recordings for the case of maximum three temporally overlapping sound events (ov3) for the first cross-validation split (split1). Within each audio folder, the filenames for training split have the 'train' prefix, while the testing split filenames have the 'test' prefix. This dataset was collected as part of the 'Sound event localization and detection of overlapping sources using convolutional recurrent neural network' work. Data collector (s): Fagerlund, Eemi; Koskimies, Aino
坦佩雷理工大学(Tampere University of Technology, TUT)2018年声音事件数据集——环绕声、混响真实环境冲激响应数据集
本数据集包含真实场景下的一阶环绕声(First Order Ambisonic, FOA)格式录音,声源为固定点声源,每个声源均对应唯一空间坐标。该数据集通过使用Eigenmike球形麦克风阵列采集真实环境中的冲激响应(Impulse Response, IR)生成。采集过程中,将Genelec G Two型扬声器缓慢沿圆形轨迹绕阵列移动,单次仅在单一仰角位置持续播放最长长度序列信号,播放音量较环境背景声高30分贝。录音场景为大学内的走廊,周边环绕教室,数据采集时段为工作时间。
冲激响应的采集参数如下:在距离Eigenmike麦克风1米处,采集仰角范围为-40°至40°,步长10°;在2米处,采集仰角范围为-20°至20°,步长10°。
本数据集包含三个子数据集,分别为:a) 最大存在1个时域重叠声音事件;b) 最大存在2个时域重叠声音事件;c) 最大存在3个时域重叠声音事件。
每个子数据集均包含3组交叉验证(Cross-Validation, CV)划分:训练集包含240段时长约30秒的录音,测试集包含60段相同时长的录音。
每段录音对应同名元数据文件,包含声音事件名称、时域起始与结束时间(单位:秒)、方位角与仰角空间位置(单位:度),以及与麦克风的距离(单位:米)。
孤立的声音事件样本取自UrbanSound8K数据集。该数据集包含10类声音事件,分别为空调(air_conditioner)、汽车鸣笛(car_horn)、儿童玩耍(children_playing)、犬吠(dog_bark)、钻孔作业(drilling)、发动机怠速(engine_idling)、枪击(gun_shot)、凿岩机(jackhammer)、警报器(siren)以及街头音乐(street_music)。
本数据集未包含空调与儿童玩耍两类声音事件,且仅选取数据集中标记为前景的声音事件样本。我们采用UrbanSound8K数据集中的第1、8、9组划分作为本数据集的3组交叉验证划分,选择这三组划分的原因是:在仅保留前景样本后,这三组划分覆盖了所有选定的声音事件类别,且样本数量充足。
在声音场景合成过程中,我们随机选取一个声音事件样本,为其随机分配采集得到的距离、方位角与仰角参数。随后将该声音事件样本与对应距离、方位角、仰角的冲激响应进行卷积,以实现空间定位。
metadata.zip文件夹包含本数据集的授权协议与完整元数据。其余9个压缩文件分别对应不同重叠程度与交叉验证划分的数据集。例如,wav_ov3_split1_30db.zip包含最大3个时域重叠声音事件(ov3)场景下,第1组交叉验证划分(split1)的训练与测试录音。
在每个音频文件夹中,训练集录音的文件名带有“train”前缀,测试集录音的文件名带有“test”前缀。
本数据集作为“基于卷积循环神经网络的声音事件定位与重叠源检测”研究工作的一部分采集完成。数据采集者:Fagerlund, Eemi; Koskimies, Aino
创建时间:
2023-06-28



