FSD-2k
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4730389
下载链接
链接失效反馈官方服务:
资源简介:
Created By Félix Gontier and Mathieu Lagrange, LS2N, CNRS, Ecole Centrale Nantes
Contact : mathieu.lagrange@cnrs.fr
If used for research, please refer to:
@article{gontier2021training,
title={Polyphonic training set synthesis improves self-supervised urban sound classification},
author={Félix Gontier and Vincent Lostanlen, and Mathieu Lagrange and Nicolas Fortin and Jean-Francois Petiot and Catherine Lavandier},
journal={The Journal of the Acoustical Society of America},
year={2021},
publisher={Acoustical Society of America}
}
FSD-2k contains about 200 monophonic audio clips collected from online resources, which are unrelated to the city of Lorient: Freesound for birds and traffic and Librispeech for voice.
The total duration of the dataset is of the order of 2.4k seconds, i.e., 40 minutes. Each audio samples are cut into one or several 3 seconds parts, each resulting into spectrograms of size 23x29, leading to a dataset of 609 spectrograms. Low volume amorphic background noise recordings is added and the cut audio sample is centered within the 3 seconds if shorter.
>> import numpy as np
>> s=np.load('FSD-2k_train_spectralData.npy')
>> print(s.shape)
(609, 23, 29)
The three dimensions respectively corresponds to the sceneId, the frameId (time), and the spectralId (frequency).
>> a=np.load('FSD-2k_train_presence.npy')
>> print(a.shape)
(609, 16, 3)
The 3 dimensions corresponds to the sceneId, the frameId (time), the sourceId (traffic, voice, birds). Annotation is provided as a binary indicator of source presence for one second, that is 8 consecutive 125 ms frames with a hop of one frame.
创建时间:
2021-06-02



