Urban Soundscapes of the World
收藏Mendeley Data2024-05-10 更新2024-06-27 收录
下载链接:
https://zenodo.org/records/10106181
下载链接
链接失效反馈官方服务:
资源简介:
The Urban Soundscapes of the World database currently contains about 130 high-quality audiovisual recordings performed within 9 cities worldwide. The csv and json files contain the recording locations. Each recording consists of a 360-degree video file (4096 x 2048 resolution, 30 fps), a 4-channel first-order ambisonics (ACN/SN3D) audio file and/or a binaural audio file. All audio files have a sample rate of 48 kHz and are 24-bit PCM encoded. All audio and video files are time-synchronized. Recordings are made during the day, in favorable weather conditions with little to no wind. Note that the recordings always present a snapshot in time. Combined and simultaneous audio and video recordings are performed using a portable, stationary recording setup as shown on the picture. The setup consists of the following components (from top to bottom): First order ambisonics: Core Sound TetraMic with windshield and Tascam DR-680 MkII 4-channel recording device; 360-degree video camera: GoPro Omni spherical camera system (only available upon request). Binaural audio: HEAD acoustics HSU III.2 artificial head with windshield and SQobold 2-channel recording device; The ears of the artificial head, the video camera system and the ambisonics microphone are located at heights of about 1.5m, 1.7m and 1.9m, respectively. At each location, the recording system is oriented towards the most important sound source and/or the most prominent visual scene—this orientation defines the initial frontal viewing direction for the 360-degree video and ambisonics recordings, and the fixed orientation for the binaural recordings. All audio files are calibrated to the same reference, so once you have your playback setup calibrated, it can be used to play all files. The csv and json file contains the one-minute LAeq values of the binaural recordings (average of left and right channel and left and right channels separately). These values are the most representative for the LAeq at the location. The second column presents the LAeq of the mono mix (superposition) of both left and right channels of the binaural recording. Note that this is not necessarily the same as the (energetic) average of the LAeq's of both left and right channels separately, because both channels are to some degree correlated (depending on the diffuseness of the sound field). This explains why the (energetic) average of the third and fourth column will not always exactly correspond to the value in the second column, but the difference is usually small. Roughly speaking, the larger the difference, the more the sound at both ears is correlated. More details on the recording setup and protocol can be found in our publications. Note that some publications contain LAeq values that were calculated from the ambisonics recordings (W channel). There is not really a standard way of calculating LAeq values from ambisonics recordings, so these are maybe less suitable to use in most cases.
全球城市声景(Urban Soundscapes of the World)数据库目前收录了全球9座城市内录制的约130段高质量音视频数据。配套的CSV与JSON文件中记录了各段录制的采集点位信息。每段录制数据均包含必选的360度视频文件(分辨率4096×2048,帧率30fps),以及4通道一阶安比森环绕声(Ambisonics,ACN/SN3D制式)音频文件与双耳音频文件中的至少一种(或二者兼有)。所有音频文件均采用48kHz采样率、24位脉冲编码调制(PCM)格式,且所有音视频文件均已实现时间同步。
所有录制均于日间完成,采集时天气状况良好,风力微弱或无明显阵风。需注意,每段录制均为某一时刻的声景瞬时快照。本次音视频同步录制采用便携式固定录制装置完成,装置外观如图所示。该装置(从上至下)的组成部件如下:
1. 一阶安比森环绕声采集单元:带防风罩的Core Sound TetraMic麦克风与Tascam DR-680 MkII 4通道录音设备;
2. 360度视频采集单元:GoPro Omni全景相机系统(仅可按需提供);
3. 双耳音频采集单元:带防风罩的HEAD acoustics HSU III.2人工头麦克风与SQobold 2通道录音设备。
人工头的拾音耳、视频相机系统与安比森环绕声麦克风的安装高度分别约为1.5m、1.7m与1.9m。在每个采集点位,录制系统均朝向主要声源和/或最具代表性的视觉场景——该朝向定义了360度视频与安比森环绕声录制的初始正面视角,同时也是双耳音频录制的固定朝向。
所有音频文件均采用统一基准进行校准,因此在完成播放设备的校准后,即可用于播放所有音频文件。配套的CSV与JSON文件中收录了双耳音频录制的1分钟等效连续声级(LAeq)数据,内容包含左右声道的平均值,以及左右声道各自的单独数值。该类数据可最具代表性地反映采集点位的LAeq水平。文件第二列给出了双耳音频录制左右声道叠加后的单声道混合信号的LAeq值。需注意,该单声道混合信号的LAeq值与左右声道各自LAeq的(能量)平均值未必完全一致,这是因为左右声道存在一定程度的相关性(相关性强弱取决于声场的扩散度)。这也解释了为何第三列与第四列的(能量)平均值未必与第二列的数值完全匹配,但二者的差值通常较小。粗略而言,该差值越大,则双耳接收到的声音相关性越强。
关于录制装置与采集流程的更多细节,可参阅本团队已发表的研究成果。需注意,部分已发表成果中采用了基于安比森环绕声录制的W声道计算得到的LAeq值。由于目前尚无通用的安比森环绕声音频LAeq值计算标准,因此这类数据在多数场景下的适用性相对有限。
创建时间:
2023-11-23
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集是一个全球城市声景的高质量视听记录集合,包含约130个记录,覆盖9个城市,提供360度视频、4通道Ambisonics音频和双耳音频文件,所有文件时间同步且音频参数一致。数据集旨在遵循ISO 12913-2标准,为城市声学环境研究和沉浸式再现提供参考示例,支持声景对城市空间质量和人类福祉影响的探索。
以上内容由遇见数据集搜集并总结生成



