SONYC-Backgrounds: a collection of urban background recordings from an acoustic sensor network

Mendeley Data2024-03-27 更新2024-06-27 收录

下载链接：

https://zenodo.org/record/5129078

下载链接

链接失效反馈

官方服务：

资源简介：

Created by Aurora Cramer (1, 2), Mark Cartwright (3), Fatemeh Pishdadian (4), Juan Pablo Bello (1,2,5,6) 1. Music and Audio Research Lab, New York University 2. Department of Electrical and Computer Engineering, New York University 3. Department of Informatics, New Jersey Institute of Technology 4. Interactive Audio Lab, Northwestern University 5. Center for Urban Science and Progress, New York University 6. Department of Computer Science and Engineering, New York University Publication If you use this data in your work, please cite the following paper, which introduced this dataset: [1] Cramer, A., Cartwright, M., Pishdadian, F., and Bello, J.P. Weakly Supervised Source-Specific Sound Level Estimation in Noisy Soundscapes. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2021. [pdf] Description SONYC-Backgrounds is an open dataset of recordings of urban background noise obtained from the SONYC acoustic sensor network [2]. This dataset was developed with the goal of synthesizing soundscapes with a diverse set of realistic sounding background activity, for use in developing and evaluating machine listening systems in urban settings. Data acquisition The provided audio has been acquired using the SONYC acoustic sensor network for urban noise pollution monitoring [2]. Over 50 different sensors have been deployed in New York City. All recordings are 10 seconds and were recorded with identical microphones at identical gain settings. Recording selection From the large collection of audio recordings acquired in 2017, we obtain a much smaller subset of likely background recordings. We first process the dataset using a sensor fault detector to filter out recordings with artifacts caused by hardware failures in the sensors. The sensor fault detector is a random forest, trained with a small collection of audio examples using active learning [3]. We then determine if a recording is background or not using an urban sound classifier trained to detect the presence of sources of interest to urban noise pollution monitoring [4, 5]. We use the classifier to find recordings that *do not* contain the sound classes of interest. The classifier model is a multi-layer perception with two hidden layers, which takes as input an OpenL3 embedding [6] for a 1 s clip of audio and produces multi-label prediction probabilities for each class. This model is nearly identical to the one used for the DCASE 2019 Challenge Urban Sound Tagging Task baseline model, aside from the addition of an extra hidden layer. Predictions for entire recordings are obtained by max-pooling the predictions for each class across time. A recording is considered background if the probabilities of the target classes fall below their respective detection thresholds, i.e. no target classes are detected. The classifier was trained on the SONYC-UST v1 dataset [4], and the detection thresholds for each class were tuned to correspond to 70% negative recall (true negative rate) on the test set to increase the likelihood that recordings are background. After this selection process, we obtain 441 background clips. Metadata To maintain privacy, the recordings in this release have been distributed in time and location, and recording times have been quantized to the hour. Sensor IDs are consistent with those SONYC-UST dataset [4]. The corresponding location of the sensors can be found in the SONYC-UST v2 dataset [5], though these locations have been mapped to the "block" level to maintain privacy. See the DCASE 2020 Challenge Urban Sound Tagging with Spatiotemporal Context Task page for more information on the metadata. Data splits The dataset is partitioned into a train/valid/test split of roughly 60/20/20, using a simple greedy method to assign sensors to subsets. Files The dataset directory contains the directories `train`, `valid`, and `test` for each of the respective data subsets. Each directory contains recordings, with the file format: `<sensor-id>_<year>-<month>-<day>_<hour>_<instance-num>.wav`, where `<instance-num>` is used to distinguish recordings from the same sensor occurring during the same hour. Aside from `<year>`, each of these fields in the format are lead zero padded to two places (i.e. `printf` format `"%02d"`). Conditions of use Dataset created by Aurora Cramer, Mark Cartwright, Fatemeh Pishdadian, and Juan Pablo Bello. The SONYC-Backgrounds dataset is offered free of charge under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license: https://creativecommons.org/licenses/by/4.0/ The dataset and its contents are made available on an “as is” basis and without warranties of any kind, including without limitation satisfactory quality and conformity, merchantability, fitness for a particular purpose, accuracy or completeness, or absence of errors. Subject to any liability that may not be excluded or limited by law, New York University is not liable for, and expressly excludes all liability for, loss or damage however and whenever caused to anyone by any use of the SONYC-Backgrounds dataset or any part of it. Contact If you have any questions, comments, or concerns, please direct correspondence to Aurora Cramer (aurora (dot) linh (dot) cramer (at) gmail (dot) com). References and Links [1] Cramer, A., Cartwright, M., Pishdadian, F., and Bello, J.P. Weakly Supervised Source-Specific Sound Level Estimation in Noisy Soundscapes. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2021. [2] Bello, J. P., Silva, C., Nov, O., Dubois, R. L., Arora, A., Salamon, J., C. Mydlarz, and Doraiswamy, H. (2019). Sonyc: A system for monitoring, analyzing, and mitigating urban noise pollution. Communications of the ACM, 62(2), 68-77. [3] Wang, Y., Mendez, A.E.M., Cartwright, M., and Bello, J.P. Active Learning for Efficient Audio Annotation and Classification with a Large Amount of Unlabeled Data. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019. [4] Cartwright, M., Mendez, A.E.M., Cramer, A., Lostanlen, V., Dove, G., Wu, H., Salamon, J., Nov, O., and Bello, J.P. SONYC Urban Sound Tagging (SONYC-UST): A Multilabel Dataset from an Urban Acoustic Sensor Network. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE) , 2019. [5] Cartwright, M., Cramer, A., Mendez, A.E.M., Wang, Y., Wu, H., Lostanlen, V., Fuentes, M., Dove, G., Mydlarz, C., Salamon, J., Nov, O., and Bello, J.P. SONYC-UST-V2: An Urban Sound Tagging Dataset with Spatiotemporal Context. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2020. [6] Look, Listen and Learn More: Design Choices for Deep Audio Embeddings Cramer, A., Wu, H.-H., Salamon J., and Bello. J.P. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019. Acknowledgements We would like to thank all those involved in the SONYC project. This work is partially supported by National Science Foundation award 1633259 and award 1544753.

本数据集由以下人员创建：Aurora Cramer(1,2)、Mark Cartwright(3)、Fatemeh Pishdadian(4)、Juan Pablo Bello(1,2,5,6) 1. 纽约大学音乐与音频研究实验室（Music and Audio Research Lab, New York University） 2. 纽约大学电气与计算机工程系（Department of Electrical and Computer Engineering, New York University） 3. 新泽西理工学院信息学系（Department of Informatics, New Jersey Institute of Technology） 4. 西北大学交互式音频实验室（Interactive Audio Lab, Northwestern University） 5. 纽约大学城市科学与进步中心（Center for Urban Science and Progress, New York University） 6. 纽约大学计算机科学与工程系（Department of Computer Science and Engineering, New York University） ### 引用要求若您在研究工作中使用本数据集，请引用下述介绍该数据集的学术论文： [1] Cramer, A., Cartwright, M., Pishdadian, F. 及 Bello, J.P. 《嘈杂声景中基于弱监督的源特定声级估计》，发表于IEEE音频与声学信号处理应用研讨会（IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA），2021年。[pdf] ### 数据集概述 SONYC-Backgrounds是一款开源数据集，收录了从SONYC声学传感器网络[2]获取的城市背景噪声录音。本数据集旨在合成包含多样化真实背景声活动的声景（soundscape），用于开发和评估城市场景下的机器听觉系统（machine listening system）。 ### 数据采集本数据集提供的音频通过用于城市噪声污染监测的SONYC声学传感器网络[2]采集。纽约市已部署超过50个不同的传感器节点。所有录音时长均为10秒，且采用相同型号的麦克风与相同的增益设置进行录制。 ### 录音筛选我们从2017年采集的大量音频录音中，筛选得到一个规模更小的疑似背景录音子集。首先，我们使用传感器故障检测器对数据集进行预处理，过滤掉因传感器硬件故障产生伪影的录音。该传感器故障检测器为随机森林（random forest）模型，基于主动学习（active learning）[3]方法，使用少量音频样本进行训练。随后，我们使用经过训练的城市声分类器判断录音是否为背景录音，该分类器用于检测城市噪声污染监测中关注的声源类别[4,5]。我们通过该分类器筛选出**不包含**目标声类别的录音。该分类器模型为包含两个隐藏层的多层感知机（multi-layer perception），以1秒音频片段的OpenL3嵌入（OpenL3 embedding）[6]作为输入，为每个类别输出多标签预测概率。除额外增加一个隐藏层外，该模型与DCASE 2019挑战赛城市声纹标注任务的基线模型基本一致。完整录音的预测结果通过对每个类别的时域预测结果进行最大池化（max-pooling）得到。若目标类别的预测概率均低于各自的检测阈值（detection threshold），即未检测到任何目标类别，则该录音被认定为背景录音。该分类器在SONYC-UST v1数据集[4]上训练得到，我们将每个类别的检测阈值调整至在测试集上对应70%的负召回率（真阴性率，true negative rate），以提高录音被认定为背景的可能性。经过上述筛选流程后，我们最终获得441条背景录音片段。 ### 元数据为保护隐私，本发布版中的录音已在时间与空间维度上进行脱敏处理，录音时间已按小时进行量化。传感器ID与SONYC-UST数据集[4]中的保持一致。传感器的对应位置可在SONYC-UST v2数据集[5]中查询，但为保护隐私，这些位置已被映射至街区级别。有关元数据的更多信息，请参阅DCASE（国际声学场景与事件检测与分类研讨会，Detection and Classification of Acoustic Scenes and Events）2020挑战赛时空上下文下的城市声纹标注任务页面。 ### 数据划分本数据集采用简单的贪心算法将传感器分配至不同子集，划分为训练集/验证集/测试集，比例约为60/20/20。 ### 文件结构数据集目录包含分别对应各数据子集的`train`、`valid`与`test`子目录。每个子目录中均收录录音文件，文件名格式为：`<传感器ID>_<年>-<月>-<日>_<小时>_<实例编号>.wav`，其中`<实例编号>`用于区分同一传感器在同一小时内录制的多条录音。除`<年>`外，格式中的其余字段均采用两位前导零填充（即`printf`格式 `"%02d"`）。 ### 使用条款本数据集由Aurora Cramer、Mark Cartwright、Fatemeh Pishdadian及Juan Pablo Bello创建。SONYC-Backgrounds数据集依据知识共享署名4.0国际许可协议（Creative Commons Attribution 4.0 International, CC BY 4.0）免费开放使用，许可链接：https://creativecommons.org/licenses/by/4.0/ 本数据集及其内容按“现状”提供，不附带任何形式的担保，包括但不限于对质量、符合性、适销性、特定用途适用性、准确性或完整性，以及无错误的默示担保。在法律允许的最大范围内，纽约大学不对因使用本数据集或其任何部分而导致的任何损失或损害承担责任，并明确排除所有相关责任。 ### 联系方式若您有任何疑问、意见或建议，请联系Aurora Cramer（邮箱：aurora.linh.cramer@gmail.com，原格式为aurora (dot) linh (dot) cramer (at) gmail (dot) com）。 ### 参考文献与链接 [1] Cramer, A., Cartwright, M., Pishdadian, F., and Bello, J.P. Weakly Supervised Source-Specific Sound Level Estimation in Noisy Soundscapes. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2021. [2] Bello, J. P., Silva, C., Nov, O., Dubois, R. L., Arora, A., Salamon, J., Mydlarz, C., and Doraiswamy, H. (2019). Sonyc: A system for monitoring, analyzing, and mitigating urban noise pollution. Communications of the ACM, 62(2), 68-77. [3] Wang, Y., Mendez, A.E.M., Cartwright, M., and Bello, J.P. Active Learning for Efficient Audio Annotation and Classification with a Large Amount of Unlabeled Data. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019. [4] Cartwright, M., Mendez, A.E.M., Cramer, A., Lostanlen, V., Dove, G., Wu, H., Salamon, J., Nov, O., and Bello, J.P. SONYC Urban Sound Tagging (SONYC-UST): A Multilabel Dataset from an Urban Acoustic Sensor Network. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2019. [5] Cartwright, M., Cramer, A., Mendez, A.E.M., Wang, Y., Wu, H., Lostanlen, V., Fuentes, M., Dove, G., Mydlarz, C., Salamon, J., Nov, O., and Bello, J.P. SONYC-UST-V2: An Urban Sound Tagging Dataset with Spatiotemporal Context. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2020. [6] Look, Listen and Learn More: Design Choices for Deep Audio Embeddings Cramer, A., Wu, H.-H., Salamon J., and Bello. J.P. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019. ### 致谢感谢所有参与SONYC项目的人员。本研究部分受到美国国家科学基金会（National Science Foundation）编号1633259与1544753的项目资助。

创建时间：

2023-06-28