five

Acoustic features as a tool to visualize and explore marine soundscapes: Applications illustrated using marine mammal Passive Acoustic Monitoring datasets

收藏
Mendeley Data2024-04-13 更新2024-06-28 收录
下载链接:
https://datadryad.org/stash/dataset/doi:10.5061/dryad.3bk3j9kn8
下载链接
链接失效反馈
官方服务:
资源简介:
# Data for: Acoustic features as a tool to visualize and explore marine soundscapes: applications illustrated using marine mammal Passive Acoustic Monitoring datasets. The data and scripts provided here allows replicating the results presented in the publication: "Acoustic features as a tool to visualize and explore marine soundscapes: applications illustrated using marine mammal Passive Acoustic Monitoring datasets." # **List of tables:** ***SM_1_WMD_Features_and_Labels.csv*** -> table containing VGGish features extracted from audio files downloaded from the Watkins Marine Mammals Sounds Database (). Missing values in this dataset are marked as **nan**. **Fields description:** ID_row : progressive ID number for each row in the dataset 0 - 127: labels for the 128 VGGish features. ID: reference to the Watkins Marine Mammal Sounds Database. Each ID corresponds to an audio file stored in the database. SPECIES: Species associated with each recording from the Watkins Marine Mammal Sounds Database. The species identifiers are coded using four characters: the first two letters of the genus, followed by the first two letter of the species (e.g., *Eubalaena glacialis* -> Eugl). HGROUP: marine mammal functional hearing group (HF: high-frequency species; LF: low-frequency species) TAX: taxonomic group (Mys: Mysticete; Odo: Odontocete) COUNTRY: labels for the country of origin of the recording, obtained from the Watkins Marine Mammal Sounds Database (Us: United States; Ca: Canada; Ns: Canada - Nova Scotia; Nr: Norway; Bm: Bahamas; Uk: British Virgin Islands; Pr: Puerto Rico; Au: Australia; Ar: Argentina; It:Italy; Sl: Santa Lucia; Svg: Saint Vincent's and the Grenadines; Ma: Madeira; Ml: Malta; Cr: Croatia). **NOTE:** This field contains empty cells. Records with unknown/unspecified origin in the country were left empty (no value assigned) to indicate a missing value. sample_ID: progressive ID for each VGGish feature within an audio file. prog_ID: field combining ID and sample_ID to uniquely identify each VGGish feature as a fragment of a recording from the Watkins Marine Mammal Sounds Database with 960 ms of duration. ***SM_2_Annotations_Dataframe_Multilable.xlsx*** -> excel table containing annotated audio files and corresponding VGGish features for a subset of the Placentia Bay PAM dataset. The annotations were prepared using Raven Software (. **Fields descriptions:** ID_row : progressive ID number for each row in the dataset 0 - 127: labels for the 128 VGGish features HW detection; HW visual: Humpback whale model detections and visual detections (0= absence; 1=presence) ***SM_3_RI_features_database.csv*** -> VGGish feature dataset for the full Placentia Bay PAM dataset with time stamps. **Fields description** File: Original file name of the audio file with start time embedded in the file name (e.g., AMAR667.20190722T054122Z). Channel: selected channel (1 for all audio files). Begin Time (s) & End Time (s): elapsed time from the start time of the audio file to the end of the time-window used to generate acoustic features. Start - end times begin with 0 and progress with increments of 4.8 s till the end of the audio file. The Begin Time is reset to 0 at the start of each subsequent audio file. Low Freq (Hz) & High Freq (Hz): lower (Low Freq) and upper (High Freq) frequency limits of the audio samples in Hz. Delta Time: Difference between End Time (s) & Begin Time (s) Delta Freq (Hz): Difference between High Freq (Hz) and Low Freq (Hz) Avg Power Density (dB FS/Hz): uncalibrated average power density for the audio sample 0 - 127: labels for the 128 VGGish features HW_detection & HW visual: humpback whale detections from PacificSoundDetectHumpbackSong () and marked through visual inspection of audio recordings (0 = absence; 1 = presence). location: hydrophone deployment location ***SM_4_PBD_Oceanographic_Data*** -> table containing environmental variables collected by the Smart Atlantic Buoy located in Red Island (Placentia Bay) () in proximity to the hydrophone deployment location of the Placentia Bay PAM dataset. **NOTE:** This field contains empty cells. Records with no measures available in the original data were left empty (no value assigned) to indicate a missing value. **Fields description** station_name: unique ID field for the station time: time stamp for the oceanographic data, in the format yyyy-mm-dd HH:MM:SS.00 longitude & latitude (precise_longitude & precise_latitude): general location of the station (precise location of the insturument) wind_spd_avg & wind_spd2_avg : average wind speed measured in m/s wind_spd_max & wind_spd2_max: max wind speed of wind gusts measured in m/s wind_dir_avg & wind_dir2_avg: average wind direction in degrees air_temp_avg: average air temperature in degree Celsius. air_pressure_av: average atmospheric pressure in millibar (mbar) air_humidity_avg: average air humidity, unitless and ranging 0-100. air_dewpoint_avg: dewpoint temperature in degree Celsius: surface_temp_avg: average temperature at the ocean surface, in degree Celsius wave_ht_max: sea surface wave maximum height (m) wave_ht_sig: sea surface wave significant height (m) wave_period_max: sea surface maximum period (s) wave_dir_avg: average sea surface wave direction in degrees wave_spread_avg: sea surface wave directional spread in degrees curr_dir_avg: sea water velocity to direction in degrees curr_spd_avg: sea water speed in mm/s Metadata and variables descriptions can be found here: # **List of scripts** The scripts provided here read the data tables and reproduce the analysis and figures presented in the manuscript. The scripts were prepared using Google Collaboratory and written using Python language. Running the scripts requires connecting the notebook to a GDrive account where the data tables have been uploaded. SM_5_WMD_species_and_locations.ipynb -> the script replicates the analysis performed on the recordings from the Watkins Marine Mammal Sound Database. SM_6_PBD_Detections.ipynb; SM_7_PBD_Ocean_Variables.ipynb -> the scripts replicate the analysis performed on the Placentia Bay PAM dataset. \**External Data Sources and Scripts: ** * Source of archival marine mammal vocalizations: Watkins Marine Mammal Sound Database, Woods Hole Oceanographic Institution and the New Bedford Whaling Museum (). Free to download for personal or academic (not commercial) use (). * Source of oceanographic data: Smart Atlantic Buoy - Red Island ( * CNN model for the detection of humpback whale vocalizations: PacificSoundDetectHumpbackSong (). Copyright (c) 2022, MBARI. (License: GNU GPL; * Nested cross validation adapted from: Jason Brownlee, *Nested Cross-Validation for Machine Learning with Python*, Machine Learning Mastery, available at

# 对应研究数据集:以声学特征作为可视化与探索海洋声景的工具——基于海洋哺乳动物被动声学监测(Passive Acoustic Monitoring)数据集的应用示例。本文提供的数据与脚本可复现发表于论文《Acoustic features as a tool to visualize and explore marine soundscapes: applications illustrated using marine mammal Passive Acoustic Monitoring datasets.》的全部研究结果。 ## 表格列表: ***SM_1_WMD_Features_and_Labels.csv***:该表格包含从沃特金斯海洋哺乳动物声音数据库(Watkins Marine Mammals Sounds Database)下载的音频文件中提取的VGGish特征。本数据集的缺失值以**nan**标记。 ### 字段说明: ID_row:数据集每一行的递增编号 0~127:对应128个VGGish特征的标签 ID:指向沃特金斯海洋哺乳动物声音数据库的索引,每个ID对应数据库中存储的一个音频文件 SPECIES:关联沃特金斯海洋哺乳动物声音数据库中每条录音的物种信息。物种标识符采用4位编码规则:属名的前两个字母加上种名的前两个字母(例如*Eubalaena glacialis* → Eugl) HGROUP:海洋哺乳动物功能性听觉分组(HF:高频物种;LF:低频物种) TAX:分类学类群(Mys:须鲸小目;Odo:齿鲸小目) COUNTRY:录音来源国的标签,数据来源于沃特金斯海洋哺乳动物声音数据库,编码如下:Us:美国;Ca:加拿大;Ns:加拿大-新斯科舍省;Nr:挪威;Bm:巴哈马;Uk:英属维尔京群岛;Pr:波多黎各;Au:澳大利亚;Ar:阿根廷;It:意大利;Sl:圣卢西亚;Svg:圣文森特和格林纳丁斯;Ma:马德拉群岛;Ml:马耳他;Cr:克罗地亚。**备注:** 该字段存在空值。对于来源国未知或未指定的录音记录,该字段留空(未赋值)以表示缺失值。 sample_ID:单个音频文件内每个VGGish特征的递增编号 prog_ID:由ID与sample_ID组合而成的字段,用于唯一标识每个VGGish特征,该特征对应沃特金斯海洋哺乳动物声音数据库中一段时长为960毫秒的录音片段。 ***SM_2_Annotations_Dataframe_Multilable.xlsx***:该Excel表格包含Placentia Bay被动声学监测(PAM)数据集子集的标注音频文件及其对应的VGGish特征。标注工作采用Raven声学分析软件完成。 ### 字段说明: ID_row:数据集每一行的递增编号 0~127:对应128个VGGish特征的标签 HW detection与HW visual:分别为座头鲸模型检测结果与人工目视检测结果(0表示未出现,1表示出现)。 ***SM_3_RI_features_database.csv***:包含完整Placentia Bay被动声学监测(PAM)数据集的VGGish特征及时间戳信息。 ### 字段说明: File:嵌入了起始时间的音频文件原始文件名(例如:AMAR667.20190722T054122Z)。 Channel:所选声道(所有音频文件均为声道1)。 Begin Time (s) & End Time (s):从音频文件起始时间到生成声学特征所用时间窗口结束的耗时(单位:秒)。起始时间从0开始,每次以4.8秒为增量递增直至音频文件结束,且每个后续音频文件的起始时间均重置为0。 Low Freq (Hz) & High Freq (Hz):音频样本的频率上下限(单位:Hz)。 Delta Time:End Time (s)与Begin Time (s)的时间差。 Delta Freq (Hz):High Freq (Hz)与Low Freq (Hz)的频率差。 Avg Power Density (dB FS/Hz):音频样本的未校准平均功率密度(单位:dB FS/Hz)。 0~127:对应128个VGGish特征的标签。 HW_detection与HW_visual:分别为来自PacificSoundDetectHumpbackSong模型的座头鲸检测结果,以及通过音频目视检查得到的标注结果(0表示未出现,1表示出现)。 location:水听器部署位置。 ***SM_4_PBD_Oceanographic_Data***:该表格包含部署于Placentia Bay红岛的Smart Atlantic浮标(Smart Atlantic Buoy)采集的环境变量数据,该浮标位于Placentia Bay被动声学监测(PAM)数据集的水听器部署位置附近。**备注:** 该字段存在空值。原始数据中无对应测量值的记录将留空(未赋值)以表示缺失值。 ### 字段说明: station_name:站点的唯一标识字段。 time:海洋学数据的时间戳,格式为yyyy-mm-dd HH:MM:SS.00。 longitude & latitude(precise_longitude & precise_latitude):站点的大致位置与仪器精确位置(经度与纬度)。 wind_spd_avg与wind_spd2_avg:平均风速,单位为米/秒(m/s)。 wind_spd_max与wind_spd2_max:阵风最大风速,单位为米/秒(m/s)。 wind_dir_avg与wind_dir2_avg:平均风向,单位为度(°)。 air_temp_avg:平均气温,单位为摄氏度(℃)。 air_pressure_avg:平均大气压强,单位为毫巴(mbar)。 air_humidity_avg:平均空气湿度,无单位,取值范围为0~100。 air_dewpoint_avg:露点温度,单位为摄氏度(℃)。 surface_temp_avg:海洋表层平均温度,单位为摄氏度(℃)。 wave_ht_max:海面最大波高,单位为米(m)。 wave_ht_sig:海面有效波高,单位为米(m)。 wave_period_max:海面最大波周期,单位为秒(s)。 wave_dir_avg:海面平均浪向,单位为度(°)。 wave_spread_avg:海面波浪方向扩散度,单位为度(°)。 curr_dir_avg:海水流速方向,单位为度(°)。 curr_spd_avg:海水流速,单位为毫米/秒(mm/s)。 元数据与变量说明可参见:# ## 脚本列表: 本次提供的脚本可读取数据表格并复现论文中展示的分析过程与图表。脚本基于Google Collaboratory平台开发,采用Python语言编写。运行脚本前需将Notebook连接至已上传数据表格的Google Drive(GDrive)账号。 SM_5_WMD_species_and_locations.ipynb:该脚本可复现针对沃特金斯海洋哺乳动物声音数据库录音的分析流程。 SM_6_PBD_Detections.ipynb与SM_7_PBD_Ocean_Variables.ipynb:这两个脚本可复现针对Placentia Bay被动声学监测(PAM)数据集的分析流程。 ## 外部数据源与脚本: * 海洋哺乳动物鸣叫声存档数据源:沃特金斯海洋哺乳动物声音数据库,隶属于伍兹霍尔海洋研究所与新贝德福德捕鲸博物馆。该数据库可免费下载用于个人或学术(非商业)用途。 * 海洋学数据数据源:Smart Atlantic浮标-红岛站点 * 座头鲸鸣叫声检测卷积神经网络(CNN)模型:PacificSoundDetectHumpbackSong。版权所有 © 2022,蒙特雷湾水族馆研究所(MBARI)。授权协议:GNU通用公共许可证(GNU GPL)。 * 嵌套交叉验证方法改编自:Jason Brownlee所著《机器学习中的嵌套交叉验证(Python实现)》,发布于Machine Learning Mastery平台,可于以下地址获取:
创建时间:
2024-02-17
二维码
社区交流群
二维码
科研交流群
商业服务