five

FSD-2k

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4730389
下载链接
链接失效反馈
官方服务:
资源简介:
Created By Félix Gontier and Mathieu Lagrange, LS2N, CNRS, Ecole Centrale Nantes Contact : mathieu.lagrange@cnrs.fr If used for research, please refer to: @article{gontier2021training, title={Polyphonic training set synthesis improves self-supervised urban sound classification}, author={Félix Gontier and Vincent Lostanlen, and Mathieu Lagrange and Nicolas Fortin and Jean-Francois Petiot and Catherine Lavandier}, journal={The Journal of the Acoustical Society of America}, year={2021}, publisher={Acoustical Society of America} } FSD-2k contains about 200 monophonic audio clips collected from online resources, which are unrelated to the city of Lorient: Freesound for birds and traffic and Librispeech for voice. The total duration of the dataset is of the order of 2.4k seconds, i.e., 40 minutes. Each audio samples are cut into one or several 3 seconds parts, each resulting into  spectrograms of size 23x29, leading to a dataset of 609 spectrograms. Low volume amorphic background noise recordings is added and the cut audio sample is centered within the 3 seconds if shorter. >> import numpy as np >> s=np.load('FSD-2k_train_spectralData.npy') >> print(s.shape) (609, 23, 29) The three dimensions respectively corresponds to the sceneId, the frameId (time), and the spectralId (frequency). >> a=np.load('FSD-2k_train_presence.npy') >> print(a.shape) (609, 16, 3) The 3 dimensions corresponds to the sceneId, the frameId (time), the sourceId (traffic, voice, birds). Annotation is provided as a binary indicator of source presence for one second, that is 8 consecutive 125 ms frames with a hop of one frame.
创建时间:
2021-06-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作