Data for: ToadFishFinder classifier model v4: A catalog of oyster toadfish (Opsanus tau) calls for machine learning
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.gtht76hr9
下载链接
链接失效反馈官方服务:
资源简介:
This data repository contains labeled passive underwater acoustic data used to train and test the machine-learning model of Bohnenstiehl (in prep – 2023), Automated cataloging of oyster toadfish (Opsanus tau) calls using template matching and machine learning. The software accompanying this paper is known as ToadFishFinder, and the classifier model presented in the paper is v4. It consists of more than 10000 labeled toadfish and 10000 labeled other signals. Labeled spectrogram images are provided, along with pressure-corrected waveforms (micro-Pascals) sampled at 24 kHz. Each waveform sample is 1350 ms long. The center 850 ms of these waveform segments represent the portion of the signal used in training and testing the classifier model. Waveform data are provided in multiple formats: 1) MATLAB (.mat) files containing the 'boatwhistle' and 'other' waveforms stored in column format, and 2) individual .wav files, each containing a labeled waveform example. Codes are provided to demonstrate how these .wav files can be read into MATLAB and PYTHON. These labeled data can be used to re-train the ToadFishFinder model or develop alternative classifiers.
Methods
Labeled signals were extracted from passive acoustic data collected at eight sites within southwestern Pamlico Sound near its confluence with the Pamlico and Neuse River estuaries. As part of a larger effort to monitor the evolution of these reef habitats, each site was outfitted with a SoundTrap 300 hydrophone affixed ~0.5 m above the seabed a the top of a metal stake anchored with a concrete block. Monitoring extended from the fall of 2016 through the fall of 2017, and from the Spring of 2018 through the Fall of 2018. Over most of the monitoring period, the recorders were programmed to capture a 2-minute-duration recording every 20 minutes. Acoustic data were collected at a rate of 96,000 samples/second. They were subsequently resampled to a rate of 24 kHz with the application of an anti-aliasing filter.
ToadFishFinders spectrogram correlation detector was deployed on hundreds of randomly selected files over the 2+ year monitoring period and from all eight sites. This approach ensured that the training and test datasets captured calls from estuarine soundscapes across various seasons and with varying anthropogenic, geophysical (wind, waves, rain) and biological noise. A spectrogram, filter waveform and spectrogram image were displayed for each detection, and signals labeled as ‘bwhistle’ or ‘other’ spectrogram images were retained for training and testing purposes. The final labeled catalog consisted of more than 10,000 signals within the ‘bwhistle’ class and more than 10,000 signals within the ‘other’ class.
For each detection, a 850-ms-duration sample, beginning 400 ms before the detection time, is extracted from the unfiltered waveform data. A frequency-reassigned spectrogram was formed over the frequency range between 0 and 1200 Hz. Each spectrogram is converted to an RGB image and resized to 224-by-224 pixels. These spectrogram images are stored './bwhistle' and './other' folders in the compressed folder spectrograms_TFv4.zip.
Waveform data snippets are also provided for these labeled signals. Each snippet is 1350 ms long, with the training and test data (used in generating the spectrograms) representing the center 850 ms of data (points 6,000–26,401). These waveforms are unfiltered, pressure corrected and sampled at 24 kHz. These waveform snippets are provided in two formats. For those working in MATLAB, two .mat files are included (bwhistle_wavefrom_database_TFv4.mat and other_wavefrom_database_TFv4.mat). Each contains a 32,401 x N matrix with the snippets stored in columns and variables indicating station names, UTC times, sample rate and a time vector for plotting. For those working in other software, these snippets are also saved as individual wavfiles stored in './bwhistle' and './other' folders within the wavclips_TFv4.zip file. Scripts for reading these wavfiles in MATLAB and PYTHON are provided.
The MATLAB-based ToadFishFinder software is available here:
https://github.com/drbohnen/ToadFishFinder
创建时间:
2023-08-08



