SONYC-FSD-SED
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/6392323
下载链接
链接失效反馈官方服务:
资源简介:
Created by
Yu Wang, Mark Cartwright, and Juan Pablo Bello
Publication
If using this data in academic work, please cite the following paper, which presented this dataset:
Y. Wang, M. Cartwright, and J. P. Bello. "Active Few-Shot Learning for Sound Event Detection", INTERSPEECH, 2022
Description
SONYC-FSD-SED is an open dataset of programmatically mixed audio clips that simulates audio data in an environmental sound monitoring system, where sound class occurrences and co-occurrences exhibit seasonal periodic patterns. We use recordings collected from the Sound of New York City (SONYC) acoustic sensor network as backgrounds, and single-labeled clips in the FSD50K dataset as foreground events to generate 576,591 10-second strongly-labeled soundscapes with Scaper (including 111,294 additional test data for the experiment of sampling window). Instead of sampling foreground sound events uniformly, we simulate the occurrence probability of each class at different times in a year, creating more realistic temporal characteristics.
Source material and annotations
Due to the large size of the dataset, instead of releasing the raw audio files, we release the source material and soundscape annotations in JAMS format, which can be used to reproduce SONYC-FSD-SED using Scaper with the script in the project repository.
Background material from SONYC recordings
We pick a sensor from the SONYC sensor network and subsample from recordings it collected within a year (2017). We categorize these ∼550k 10-second clips into 96 bins based on timestamps, where each bin represents a unique combination of the month of a year, day of a week (weekday or weekend), and time of a day (divided into four 6-hour blocks). Next, we run a pre-trained urban sound event classifier over all recordings and filter out clips with active sound classes. We do not filter out footstep and bird since they appear too frequently, instead, we remove these two classes from the foreground sound material. Then from each bin, we choose the clip with the lowest sound pressure level, yielding 96 background clips.
Foreground material from FSD50K
We follow the same filtering process as in FSD-MIX-SED to get the subset of FSD50K with short single-labeled clips. In addition, we remove two classes, "Chirp_and_tweet" and "Walk_and_footsteps", that exist in our SONYC background recordings. This results in 87 sound classes. vocab.json contains the list of 87 classes, each class is then labeled by its index in the list. 0-42: train, 43-56: val, 57-86: test.
Occurrence probability modelling
For each class, we model its occurrence probability within a year. We use von Mises probability density functions to simulate the probability distribution over different weeks in a year and hours in a day considering their cyclic characteristics: \(f(x|μ, κ) = e^{κcos(x−μ)}/2πI_0(κ)\), where \(I_0(κ)\) is the modified Bessel function of order \(0\), \(\mu\) and \(1/\kappa\) are analogous to the mean and variance in the normal distribution. We randomly sample \((\mu_{year}, \mu_{day})\) from \([-\pi, \pi]\) and \((\kappa_{year}, \kappa_{day})\) from \([0, 10]\). We also randomly assign \(p_{weekday} \in [0, 1] \), \(p_{weekend} = 1 − p_{weekday}\) to simulate the probability distribution over different days in a week. Finally, we get the probability distribution over the entire year with a 1-hour resolution. At a given timestamp, we integrate \(f_{year}\) and \(f_{day}\) over the 1-hour window and multiply them together with \(p_{weekday}\) or \(p_{weekend}\) depends on the day. To speed up the following sampling process, we scale the final probability distribution using a temperature parameter randomly sampled from \([2,3]\).
Files
SONYC_FSD_SED.source.tar.gz: 96 SONYC backgrounds and 10,158 foreground sounds in `.wav` format. The original file size is 2GB.
SONYC_FSD_SED.annotations.tar.gz: 465,467 JAMS files. The original file size is 57GB.
SONYC_FSD_SED_add_test.annotations.tar.gz: 111,294 JAMS files for additional test data. The original file size is 14GB.
vocab.json: 87 classes.
occ_prob_per_cl.pkl: Occurrence probability for each foreground sound class.
References
[1] J. P. Bello, C. T. Silva, O. Nov, R. L. DuBois, A. Arora, J. Salamon, C. Mydlarz, and H. Doraiswamy, “SONYC: A system for monitoring, analyzing, and mitigating urban noise pollution,” Commun. ACM, 2019
[2] E. Fonseca, X. Favory, J. Pons, F. Font, X. Serra. "FSD50K: an Open Dataset of Human-Labeled Sound Events", arXiv:2010.00475, 2020.
创建时间:
2022-09-20



