TAU-NIGENS Spatial Sound Events 2020

Mendeley Data2024-03-27 更新2024-06-28 收录

下载链接：

https://zenodo.org/record/4064792

下载链接

链接失效反馈

官方服务：

资源简介：

DESCRIPTION: The TAU-NIGENS Spatial Sound Events 2020 dataset contains multiple spatial sound-scene recordings, consisting of sound events of distinct categories integrated into a variety of acoustical spaces, and from multiple source directions and distances as seen from the recording position. The spatialization of all sound events is based on filtering through real spatial room impulse responses (RIRs), captured in multiple rooms of various shapes, sizes, and acoustical absorption properties. Furthermore, each scene recording is delivered in two spatial recording formats, a microphone array one (MIC), and first-order Ambisonics one (FOA). The sound events are spatialized as either stationary sound sources in the room, or moving sound sources, in which case time-variant RIRs are used. Each sound event in the sound scene is associated with a trajectory of its direction-of-arrival (DoA) to the recording point, and a temporal onset and offset time. The isolated sound event recordings used for the synthesis of the sound scenes are obtained from the NIGENS general sound events database. These recordings serve as the development dataset for the DCASE 2020 Sound Event Localization and Detection Task of the DCASE 2020 Challenge. REPORT & REFERENCE: If you use this dataset please cite the report on its creation, and the corresponding DCASE2020 task setup: Politis., Archontis, Adavanne, Sharath, & Virtanen, Tuomas (2020). A Dataset of Reverberant Spatial Sound Scenes with Moving Sources for Sound Event Localization and Detection. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2020 Workshop (DCASE2020), Tokyo, Japan. A longer version with more detailed information can be also found here. AIM: The dataset includes a large number of mixtures of sound events with realistic spatial properties under different acoustic conditions, and hence it is suitable for training and evaluation of machine-listening models for sound event detection (SED), general sound source localization with diverse sounds or signal-of-interest localization, and joint sound-event-localization-and-detection (SELD). Additionally, the dataset can be used for evaluation of signal processing methods that do not necessarily rely on training, such as acoustic source localization methods and multiple-source acoustic tracking. The dataset allows evaluation of the performance and robustness of the aforementioned applications for diverse types of sounds, and under diverse acoustic conditions. SPECIFICATIONS: 600 one-minute long sound scene recordings (development dataset). 200 one-minute long sound scene recordings (evaluation dataset). Sampling rate 24kHz. About 700 sound event samples spread over 14 classes (see here for more details). 8 provided cross-validation splits of 100 recordings each, with unique sound event samples and rooms in each of them. Two 4-channel 3-dimensional recording formats: first-order Ambisonics (FOA) and tetrahedral microphone array. Realistic spatialization and reverberation through RIRs collected in 15 different enclosures. From about 1500 to 3500 possible RIR positions across the different rooms. Both static reverberant and moving reverberant sound events. Up to two overlapping sound events allowed, temporally and spatially. Realistic spatial ambient noise collected from each room is added to the spatialized sound events, at varying signal-to-noise ratios (SNR) ranging from noiseless (30dB) to noisy (6dB). The IRs were collected in Finland by staff of Tampere University between 12/2017 - 06/2018, and between 11/2019 - 1/2020. The older measurements from five rooms were also used for the earlier development and evaluation datasets TAU Spatial Sound Events 2019, while ten additional rooms were added for this dataset. The data collection received funding from the European Research Council, grant agreement 637422 EVERYSOUND. More detailed information on the dataset can be found in the included README file. EXAMPLE APPLICATION: An implementation of a trainable model of a convolutional recurrent neural network, performing joint SELD, trained and evaluated with this dataset is provided here. This implementation serves as the baseline method in the DCASE 2020 Sound Event Localization and Detection Task. DEVELOPMENT AND EVALUATION: Version 1.0 of the dataset included only the 600 development audio recordings and labels, used by the participants of Task 3 of DCASE2020 Challenge to train and validate their submitted systems. Version 1.1 included additionally the 200 evaluation audio recordings without labels, for the evaluation phase of DCASE2020. The latest version 1.2, published after the completion of the challenge, includes also the labels for the evaluation files. If researchers wish to compare their system against the submissions of DCASE2020 Challenge, they will have directly comparable results if they use the evaluation data as their testing set. DOWNLOAD INSTRUCTIONS: The three files, foa_dev.z01, foa_dev.z02, and foa_dev.zip, correspond to audio data of the FOA recording format. The three files, mic_dev.z01, mic_dev.z02, and mic_dev.zip, correspond to audio data of the MIC recording format. The metadata_dev.zip is the common metadata for both formats. The file, foa_eval.zip, corresponds to audio data of the FOA recording format for the evaluation dataset. The file, mic_eval.zip, corresponds to audio data of the MIC recording format for the evaluation dataset. The metadata_eval.zip is the common metadata for both formats. An info file is included (metadata_eval_info.txt) which specifies which of the two evaluation folds the mix file belongs to, and what is its number of overlapping events. Download the zip files corresponding to the format of interest and use your favorite compression tool to unzip these split zip files. To extract a split zip archive (named as zip, z01, z02, ...), you could use, for example, the following syntax in Linux or OSX terminal: Combine the split archive to a single archive: zip -s 0 split.zip --out single.zip Extract the single archive using unzip: unzip single.zip

DESCRIPTION: TAU-NIGENS空间声事件2020数据集包含多组空间声场景录音，其由集成于各类声学空间、且来自录制位置处不同声源方向与距离的多类别声音事件构成。所有声音事件的空间化处理均基于通过真实空间房间冲激响应（Room Impulse Response, RIR）进行的滤波，这些RIR采集自形状、尺寸与声学吸声特性各异的多个房间。此外，每组场景录音均提供两种空间录制格式：麦克风阵列格式（MIC）与一阶Ambisonics格式（FOA）。声音事件可被空间化为房间内的静止声源，或运动声源，后者需使用时变RIR。声场景中的每个声音事件均关联有其相对于录制点的到达方向（Direction-of-Arrival, DoA）轨迹，以及时间起始与偏移时刻。用于合成声场景的孤立声音事件录音源自NIGENS通用声音事件数据库，该录音集同时也是DCASE 2020挑战赛声音事件定位与检测任务的开发数据集。 REPORT & REFERENCE: 若使用本数据集，请引用其创建报告及对应的DCASE2020任务设置：Politis, Archontis、Adavanne, Sharath与Virtanen, Tuomas（2020）。《用于声音事件定位与检测的带运动声源的混响空间声场景数据集》，收录于2020年声学场景检测与分类研讨会（DCASE2020）论文集，日本东京。更详细的完整版本可在此处获取。 AIM: 本数据集包含大量不同声学条件下具备真实空间特性的声音事件混合片段，适用于训练与评估用于声音事件检测（Sound Event Detection, SED）、针对多样化声源或目标声源的通用声源定位、联合声音事件定位与检测（Joint Sound Event Localization and Detection, SELD）的机器学习听觉模型。此外，本数据集还可用于评估无需依赖训练的信号处理方法，例如声源定位方法与多声源声学跟踪方法。本数据集可用于评估上述应用在多样化声源类型与声学条件下的性能与鲁棒性。 SPECIFICATIONS: 包含600条时长一分钟的声场景录音（开发集）、200条时长一分钟的声场景录音（测试集），采样率为24kHz。数据集涵盖约700个声音事件样本，分布于14个类别（详细信息参见此处）。提供8组交叉验证划分，每组包含100条录音，且每组内均包含唯一的声音事件样本与房间。支持两种四通道三维录制格式：一阶Ambisonics（FOA）与四面体麦克风阵列（MIC）。通过采集自15个不同封闭空间的RIR实现真实空间化与混响效果，不同房间的可用RIR位置数量约为1500至3500个。支持静态混响声源与运动混响声源两种类型，时间与空间上最多允许两个重叠的声音事件。从每个房间采集的真实空间环境噪声将被添加到空间化后的声音事件中，信噪比（Signal-to-Noise Ratio, SNR）范围从无噪（30dB）到高噪（6dB）。这些RIR于2017年12月至2018年6月，以及2019年11月至2020年1月期间，由芬兰坦佩雷大学的工作人员在芬兰采集。其中5个房间的早期测量数据曾用于此前的开发与测试数据集TAU Spatial Sound Events 2019，本数据集额外新增了10个房间。本数据集的采集工作获得了欧洲研究委员会（European Research Council）编号为637422的EVERYSOUND项目资助。更多数据集详细信息可参见附带的README文件。 EXAMPLE APPLICATION: 本数据集提供了一个卷积循环神经网络的可训练模型实现，用于执行联合SELD任务，且已使用本数据集完成训练与评估。该实现作为DCASE 2020声音事件定位与检测任务的基线方法。 DEVELOPMENT AND EVALUATION: 数据集的1.0版本仅包含600条开发集音频录音与标签，供DCASE2020挑战赛任务3的参赛者用于训练与验证其提交的系统。1.1版本额外新增了200条无标签的测试集音频录音，用于DCASE2020的评估阶段。挑战赛结束后发布的最新1.2版本还包含了测试集文件的标签。若研究人员希望将其系统与DCASE2020挑战赛的提交结果进行对比，使用本测试集作为测试集即可获得直接可比的实验结果。 DOWNLOAD INSTRUCTIONS: 以下三个文件foa_dev.z01、foa_dev.z02与foa_dev.zip对应FOA录制格式的音频数据；以下三个文件mic_dev.z01、mic_dev.z02与mic_dev.zip对应MIC录制格式的音频数据。metadata_dev.zip为两种格式通用的元数据文件。文件foa_eval.zip对应测试集的FOA录制格式音频数据，文件mic_eval.zip对应测试集的MIC录制格式音频数据，metadata_eval.zip为两种格式通用的元数据文件。附带的info文件metadata_eval_info.txt会指明该混合文件属于两个测试集划分中的哪一个，以及其包含的重叠事件数量。请下载对应所需格式的压缩包文件，并使用任意解压工具拆分解压这些分卷压缩包。若要解压分卷压缩包（命名格式为zip、z01、z02等），可在Linux或OSX终端中使用如下命令：将分卷归档合并为单个归档：zip -s 0 split.zip --out single.zip；随后使用unzip解压单个归档：unzip single.zip

创建时间：

2023-06-28

搜集汇总

数据集介绍

背景与挑战

背景概述

TAU-NIGENS Spatial Sound Events 2020数据集是一个用于声音事件检测和定位研究的空间音频数据集，包含800个1分钟长的音频记录，覆盖14类声音事件，并采用两种空间录音格式（FOA和MIC）。数据集通过真实的空间房间脉冲响应模拟不同声学环境，适用于机器听觉模型的训练和评估。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集