Multichannel Environmental Sound Segmentation Dataset
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4742234
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is for multi-channel environmental sound segmentation. Multichannel environmental sound segmentation is an integrated method that deals with sound source localization (SSL), sound source separation (SSS), and classification.
To train the network that deals with SSL, SSS, and classification, it is necessary to prepare pairs of mixed sounds and separated sound source signals of which DOA and class are known. Similar datasets, DCASE2020 Task3 dataset and the Free Universal Sound Separation (FUSS) dataset are used for SELD and sound separation in DCASE 2020 Task4, but, DCASE2020 Task3 dataset do not have the separated sound source signals and FUSS dataset do not have DOA labels, respectively.
We created a dataset using 10 corpuses containing the 75-class dry sources. Spatial information was simulated for an eight-channel circular microphone array at intervals of 45 degrees and radius 0.1 m. Single point sound sources were randomly selected from the corpus, including dry sources. The distance d between the center of the microphone array and the sound source was set at 1.0 m, and the direction theta of the sound source was randomly selected at 5 degree intervals.
To generate an acoustic signal captured by the microphone array, the impulse response from the position of the sound source to the microphone array was convolved as each dry source. Each dry source was convolved with an impulse response over a random time frame t. Then, sound recorded in a restaurant and hall were mixed as background noise, to all time frames of 8 channels to obtain an average signal-to-noise ratio of about 15 dB. These sounds were assumed to be diffuse noise. Using the above method, we created a data set containing 75 classes of environmental sounds. Each sound was 4.192s long, and each mixed sound was composed of 3 classes of dry sources. The labels.csv shows the labels of classes.
The training set consisted of 10,000 mixed sounds and ground truths, and the evaluation set created 1,000 data points using a dry source not used in the training dataset.
Please refer to the papers bellow and consider to cite them.
Y. Sudo, K. Itoyama, K. Nishida and K. Nakadai, ``Sound event aware environmental sound segmentation with Mask U-Net," Journal of Advanced Robotics; 2020. Vol. 34, No. 20, pp. 1280--1290.
Y. Sudo, K. Itoyama, K. Nishida and K. Nakadai, ``Multi-channel Environmental sound segmentation with Separately Trained Spectral and Spatial Features," Journal of Applied Intelligence; 2020, 10.1007/s10489-021-02314-5.
Y. Sudo, K. Itoyama, K. Nishida and K. Nakadai, ``Environmental sound segmentation utilizing Mask U-Net," IEEE/RSJ International Conference on Intelligent Robots and Systems, Macau, 2019, pp. 5340--5345.
创建时间:
2021-08-10



