TUT Sound Events 2018 - Ambisonic, Anechoic and Synthetic Impulse Response Dataset

Mendeley Data2024-03-27 更新2024-06-28 收录

下载链接：

https://zenodo.org/record/1237703

下载链接

链接失效反馈

官方服务：

资源简介：

Tampere University of Technology (TUT) Sound Events 2018 - Ambisonic, Anechoic, and Synthetic Impulse Response Dataset This dataset consists of simulated anechoic first order Ambisonic (FOA) format recordings with stationary point sources each associated with a spatial coordinate. The dataset consists of three sub-datasets with a) maximum one temporally overlapping sound events, b) maximum two temporally overlapping sound events, and c) maximum three temporally overlapping sound events. Each of the sub-datasets has three cross-validation splits, that consists of 240 recordings of about 30 seconds long for training split and 60 recordings of the same length for the testing split. For each recording, the metadata file with the same name consists of the sound event name, the temporal onset and offset time (in seconds), spatial location in azimuth and elevation angles (in degrees), and distance from the microphone (in meters). The isolated sound events were taken from the DCASE 2016 task 2 dataset. This dataset consists of 11 sound event classes such as Clearing throat, Coughing, Door knock, Door slam, Drawer, Human laughter, Keyboard, Keys (put on a table), Page turning, Phone ringing and Speech. The sound events are randomly placed in a spatial grid with 10-degree resolution in full azimuth and [-60 60) degree elevation angles. Additionally, the sound events are placed at a random distance of [1 10] meters from the microphone. The license of the dataset can be found in the LICENSE file. The rest of the nine zip files consists of datasets for a given split and overlap. For example, the ov3_split1.zip file consists of the audio and metadata folders for the case of maximum three temporally overlapping sound events (ov3) and the first cross-validation split (split1). Within each audio/metadata folder, the filenames for training split have the 'train' prefix, while the testing split filenames have the 'test' prefix. This dataset was collected as part of the 'Sound event localization and detection of overlapping sources using convolutional recurrent neural network' work.

坦佩雷理工大学（Tampere University of Technology, TUT）2018年声音事件数据集——环绕声、无回响与合成冲激响应数据集。本数据集包含仿真生成的一阶环绕声（First Order Ambisonic, FOA）格式无回响录音，声源均为固定点源，且每个声源对应一组空间坐标。数据集分为三个子数据集，分别为：a) 最多1个时间重叠的声音事件；b) 最多2个时间重叠的声音事件；c) 最多3个时间重叠的声音事件。每个子数据集均包含3组交叉验证划分，每组划分均配置训练集与测试集：训练集包含240段时长约30秒的录音，测试集包含60段等长录音。每段录音配套同名元数据文件，文件内容包含声音事件名称、时间起始与结束时刻（单位：秒）、方位角与俯仰角空间位置（单位：度），以及声源距麦克风的距离（单位：米）。本数据集所用的孤立声音事件均取自DCASE 2016任务2数据集，共涵盖11类声音事件，分别为：清嗓、咳嗽、敲门声、关门声、抽屉开合声、人类笑声、键盘敲击声、钥匙放桌声、翻页声、电话铃声与语音。声音事件随机排布于空间网格中：方位角覆盖全范围，分辨率为10度；俯仰角范围为[-60, 60)度。同时，声源距麦克风的距离随机取自[1, 10]米区间。数据集的授权协议可参见LICENSE文件。剩余9个压缩包则对应特定重叠情况与交叉验证划分的数据集，例如ov3_split1.zip即对应最多3个时间重叠声音事件（ov3）且为第1组交叉验证划分（split1）的数据集，其内部包含音频与元数据文件夹。在每个音频或元数据文件夹中，训练集文件的文件名以"train"为前缀，测试集文件的文件名则以"test"为前缀。本数据集是《基于卷积循环神经网络的声音事件定位与重叠声源检测》研究工作的配套数据集。

创建时间：

2023-06-28

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集是一个模拟无回声一阶Ambisonic格式的声音事件数据集，包含最多三个时间重叠的声音事件，用于声音事件定位和检测研究。数据集基于DCASE 2016任务2的11个声音事件类别，声音源随机分布在空间网格中，具有方位角、仰角和距离信息，并提供三个交叉验证分割的训练和测试录音。

以上内容由遇见数据集搜集并总结生成