augCENSE-18k
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4733680
下载链接
链接失效反馈官方服务:
资源简介:
Created By Félix Gontier and Mathieu Lagrange, LS2N, CNRS, Ecole Centrale Nantes
Contact : mathieu.lagrange@cnrs.fr
If used for research, please refer to:
@article{gontier2021training,
title={Polyphonic training set synthesis improves self-supervised urban sound classification},
author={Félix Gontier and Vincent Lostanlen, and Mathieu Lagrange and Nicolas Fortin and Jean-Francois Petiot and Catherine Lavandier},
journal={The Journal of the Acoustical Society of America},
year={2021},
publisher={Acoustical Society of America}
}
augCENSE-18k is a derivative of CENSE-2k, obtained by time stretching and pitch shifting audio clips of the \emph{voice} and \emph{birds} classes at random.
The total duration of the dataset is equal to 18k seconds, i.e., the same as simCENSE-18k, with balanced material over classes. Each audio samples are cut into one or several 3 seconds parts, each resulting into spectrograms of size 23x29, leading to a dataset of 609 spectrograms. Low volume amorphic background noise recordings is added and the cut audio sample is centered within the 3 seconds if shorter.
>>> a=numpy.load('augCENSE-18k_train_spectralData.npy')
>>> a.shape
(4421, 23, 29)
>>> a=numpy.load('augCENSE-18k_train_presence.npy')
>>> a.shape
(4421, 16, 3)
The 3 dimensions corresponds to the sceneId, the frameId (time), the sourceId (traffic, voice, birds). Annotation is provided as a binary indicator of source presence for one second, that is 8 consecutive 125 ms frames with a hop of one frame.
创建时间:
2021-06-02



