Dataset for: Probably Pleasant? A Neural-Probabilistic Approach to Automatic Masker Selection for Urban Soundscape Augmentation
收藏DataCite Commons2025-06-10 更新2025-04-16 收录
下载链接:
https://researchdata.ntu.edu.sg/citation?persistentId=doi:10.21979/N9/YSJQKD
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains the log-mel spectrograms for the augmented soundscapes described in our ICASSP 2022 submission "Probably Pleasant? A Neural-Probabilistic Approach to Automatic Masker Selection for Urban Soundscape Augmentation", in .npy format. The data can be accessed using the numpy package of Python, using the command numpy.load. The dataset is available as a 5-fold cross validation dataset, with the log-mel spectrograms for each fold having filenames fold_#_features.npy and the subjective ratings for the augmented soundscapes having filenames of the format fold_#_labels.npy, where # is the number of the fold in the set {1,2,3,4,5}. The independent test set has fold index 0. Generation of augmented soundscapes Each augmented soundscape was created by adding 30-second excerpts of recordings of sounds known as maskers to binaural recordings of urban soundscapes (element-wise addition in the time domain). Each masker recording only has one class ("construction", "traffic", "water", or "wind") active for the entire duration of the recording, whereas each binaural recording of an urban soundscape may have multiple sound sources active at any point in the recording, including sound sources outside of the four masker classes. Cross-validation set The masker samples were obtained from Freesound by searching the names of the masker classes (i.e. "construction", "traffic", "water", and "wind") on Freesound, and randomly picking a selection of tracks containing 30-second sections of sound that corresponded only to that particular masker class. The soundscape samples were obtained from the Urban Soundscapes of the World (USotW) dataset, and consisted of all binaural recordings available in the public dataset, minus those with audible electrical noise, measured in-situ LA,eq values below 52 dB, and measured in-situ LA,eq values above 77 dB, in order to reflect only the accurately-captured real-life soundscapes, ensure that reproduction levels were significantly above the noise floor of the location with the highest noise floor (~36 dB) where the subjective responses were obtained, and ensure safe listening levels for our participants. In total, 120 out of the 127 publicly-available recordings in the USotW dataset were used for the cross-validation set. Test set The masker samples were obtained from Freesound in the same manner as that for the cross-validation set, but ensuring that no overlap in recordings occurred between the test set and cross-validation set maskers. The soundscape samples were taken from binaural recordings of locations in Singapore (which was not represented in any of the soundscapes in the USotW dataset and hence the cross-validation set). They were recorded under the similar Soundscape Indices Protocol and were taken in similar urban contexts as the USotW dataset Specifically, they were from a road facing a construction site, a gazebo in a park, a walkway facing a lake, a walkway facing a crowded canteen, a path facing a lake, and a path facing a lake with an aircraft flying overhead. Participant information The participants of the listening test were a sample of people who were able to physically come down to our laboratory (in Nanyang Technological University, Singapore) to listen to the stimuli and provide their responses. Their mean age was 28.4 ± 11.8 years, and there were a total of 151 female and 149 male participants. All participants were tested to have normal hearing (mean hearing threshold <20 dB (resp. 30 dB) at 0.5, 1, 2, 4, and 6 kHz for participants below (resp. equal to or above) 30 years of age).
提供机构:
DR-NTU (Data)
创建时间:
2021-10-03



