AudioSet-EV: an AudioSet-derived distribution of Emergency Vehicle Siren sounds
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14882313
下载链接
链接失效反馈官方服务:
资源简介:
AudioSet-EV, is a case-study tailored distribution of AudioSet (©Google) (AS) for acoustic emergency vehicle siren detection and recognition. By selectively grouping siren and non-siren urban sounds, enforcing taxonomy consistency, and mitigating class imbalances, AudioSet-EV offers a robust, large-scale resource for research in Machine Learning and Deep Learning acoustic modeling.
Methodology
Our design methodology encompasses a systematic selection and filtering of relevant AS samples, with AudioSet-Tools and a binary distinction between True Positives (siren-related) and True Negatives (non-siren) samples, mitigating class imbalances and label contamination. We emphasize that, given the original weak labeling nature, total reliability of the label association process cannot be guaranteed.
We structured AudioSet-EV into two primary groups:
Positives: including only EV-siren-related classes, specifically 'Police car (siren)', 'Ambulance (siren)', 'Fire engine, fire truck (siren)', and the ontology container class 'Emergency vehicle', to account for any weakly labeled or meaningful sound.
Negatives: consisting of a diverse and challenging set, comprising vehicle-related sounds ('Car', 'Car passing by', 'Power windows, electric windows', 'Tire squeal', 'Motor vehicle (road)', 'Truck', 'Air brake', 'Ice cream truck, ice cream van', 'Bus', 'Motorcycle', 'Skidding', 'Race car, auto racing', 'Bicycle', 'Train', 'Rail transport', 'Train wheels squealing', 'Railroad car, train wagon', 'Skateboard'), alarm signals ('Car alarm', 'Vehicle horn, car horn, honking', 'Bicycle bell', 'Train horn', 'Train whistle', 'Foghorn', 'Toot', 'Reversing beeps', 'Beep, bleep', 'Civil defense siren', 'Alarm', 'Smoke detector, smoke alarm', 'Fire alarm', 'Buzzer'), environmental noises ('Traffic noise, roadway noise', 'Outside, rural or natural', 'Outside, urban or manmade'). We also included some Speech, Music, and Engine-related sounds to improve robustness against waveform pattern similarities and semantic taxonomy proximities.
Pre-Processing
For Positives category, segments processing followed these steps:
Selection by Label: balanced, unbalanced, and eval AS segments were filtered according to our Positives label selection.
Segments Merging: given the scarcity and sparsity of results, samples were aggregated across resulting intermediate .csv files, to achieve greater consistency.
Blacklist Filtering: to refine our selection, any 'Civil defense siren' sample was removed to prevent contamination with non-emergency vehicle sounds.
For the Negatives category, datasets processing followed these steps:
Selection by Label: balanced, unbalanced, and eval AS entries, matching our defined non-siren labels, were extracted.
Segments Merging: extracted negative subsets were merged to consolidate a unique non-siren set.
Partial Blacklist Filtering: to avoid overlaps with the Positives category, samples containing at least one positive class label were removed, except for 'Civil defense siren', which is taxonomically included within the 'Siren' container class.
Class Re-Balancing: to minimize imbalance among ontology child leaf classes, label occurrences were counter-equalized while preserving dataset diversity. Overall class uniformity is not feasible due to the ontological structure of AS and the presence of weakly multi-labeled entries.
Final .csv files were processed through two independent instances of our AudioSet-Tools downloader, configured to re-sample YouTube audio to 32KHz, reduce files to mono, and avoid amplitude normalization. We stress the aspect that, given the large amount of Negatives, there actually exist multiple instances of this subset (due to the randomized class down-sampling process).
Summary Statistics
Samples
Emergency_Vehicle
Siren
Police car (siren)
Ambulance (siren)
Fire engine, fire truck (siren)
Positives
8409
5700
4352
3643
1931
3187
Downloaded
7324
4972
3768
3124
1637
2852
Difference (abs.)
1085
728
584
519
294
335
References
S. Giacomelli et al. - "AudioSet-Tools: a Python Research Framework for Custom AudioSet Distributing and Processing" (under peer-review)
GitHub: Dataset folder - https://github.com/StefanoGiacomelli/audioset-tools/tree/main/EV-benchmark/AudioSet-EV
AudioSet-Tools: https://github.com/StefanoGiacomelli/audioset-tools/
创建时间:
2025-02-17



