AudioSet-EV: an AudioSet-derived distribution of Emergency Vehicle Siren sounds

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/14882313

下载链接

链接失效反馈

官方服务：

资源简介：

AudioSet-EV, is a case-study tailored distribution of AudioSet (©Google) (AS) for acoustic emergency vehicle siren detection and recognition. By selectively grouping siren and non-siren urban sounds, enforcing taxonomy consistency, and mitigating class imbalances, AudioSet-EV offers a robust, large-scale resource for research in Machine Learning and Deep Learning acoustic modeling. Methodology Our design methodology encompasses a systematic selection and filtering of relevant AS samples, with AudioSet-Tools and a binary distinction between True Positives (siren-related) and True Negatives (non-siren) samples, mitigating class imbalances and label contamination. We emphasize that, given the original weak labeling nature, total reliability of the label association process cannot be guaranteed. We structured AudioSet-EV into two primary groups: Positives: including only EV-siren-related classes, specifically 'Police car (siren)', 'Ambulance (siren)', 'Fire engine, fire truck (siren)', and the ontology container class 'Emergency vehicle', to account for any weakly labeled or meaningful sound. Negatives: consisting of a diverse and challenging set, comprising vehicle-related sounds ('Car', 'Car passing by', 'Power windows, electric windows', 'Tire squeal', 'Motor vehicle (road)', 'Truck', 'Air brake', 'Ice cream truck, ice cream van', 'Bus', 'Motorcycle', 'Skidding', 'Race car, auto racing', 'Bicycle', 'Train', 'Rail transport', 'Train wheels squealing', 'Railroad car, train wagon', 'Skateboard'), alarm signals ('Car alarm', 'Vehicle horn, car horn, honking', 'Bicycle bell', 'Train horn', 'Train whistle', 'Foghorn', 'Toot', 'Reversing beeps', 'Beep, bleep', 'Civil defense siren', 'Alarm', 'Smoke detector, smoke alarm', 'Fire alarm', 'Buzzer'), environmental noises ('Traffic noise, roadway noise', 'Outside, rural or natural', 'Outside, urban or manmade'). We also included some Speech, Music, and Engine-related sounds to improve robustness against waveform pattern similarities and semantic taxonomy proximities. Pre-Processing For Positives category, segments processing followed these steps: Selection by Label: balanced, unbalanced, and eval AS segments were filtered according to our Positives label selection. Segments Merging: given the scarcity and sparsity of results, samples were aggregated across resulting intermediate .csv files, to achieve greater consistency. Blacklist Filtering: to refine our selection, any 'Civil defense siren' sample was removed to prevent contamination with non-emergency vehicle sounds. For the Negatives category, datasets processing followed these steps: Selection by Label: balanced, unbalanced, and eval AS entries, matching our defined non-siren labels, were extracted. Segments Merging: extracted negative subsets were merged to consolidate a unique non-siren set. Partial Blacklist Filtering: to avoid overlaps with the Positives category, samples containing at least one positive class label were removed, except for 'Civil defense siren', which is taxonomically included within the 'Siren' container class. Class Re-Balancing: to minimize imbalance among ontology child leaf classes, label occurrences were counter-equalized while preserving dataset diversity. Overall class uniformity is not feasible due to the ontological structure of AS and the presence of weakly multi-labeled entries. Final .csv files were processed through two independent instances of our AudioSet-Tools downloader, configured to re-sample YouTube audio to 32KHz, reduce files to mono, and avoid amplitude normalization. We stress the aspect that, given the large amount of Negatives, there actually exist multiple instances of this subset (due to the randomized class down-sampling process). Summary Statistics Samples Emergency_Vehicle Siren Police car (siren) Ambulance (siren) Fire engine, fire truck (siren) Positives 8409 5700 4352 3643 1931 3187 Downloaded 7324 4972 3768 3124 1637 2852 Difference (abs.) 1085 728 584 519 294 335 References S. Giacomelli et al. - "AudioSet-Tools: a Python Research Framework for Custom AudioSet Distributing and Processing" (under peer-review) GitHub: Dataset folder - https://github.com/StefanoGiacomelli/audioset-tools/tree/main/EV-benchmark/AudioSet-EV AudioSet-Tools: https://github.com/StefanoGiacomelli/audioset-tools/

创建时间：

2025-02-17

5,000+

优质数据集

54 个

任务类型

进入经典数据集