Seismic-Acoustic Dataset of Coastal Bryde’s Whales in the Beibu Gulf
收藏DataCite Commons2026-03-05 更新2026-02-09 收录
下载链接:
https://figshare.com/articles/dataset/Seismic-Acoustic_Dataset_of_Coastal_Bryde_s_Whales_in_the_Beibu_Gulf/30363799
下载链接
链接失效反馈官方服务:
资源简介:
1. OverviewThis repository contains the dataset and deep learning framework used in the study:<b>“Listening to Whales with an Island Seismometer: Year-Round Presence and Diel Rhythms of Bryde’s Whales Unveiled by Deep Learning.”</b>It provides a complete and reproducible pipeline for detecting coastal Bryde’s whale vocalizations using three-component seismic data recorded at the XYD (Xieyang Island) station in the Beibu Gulf, northwestern South China Sea, during January–December 2021.The repository includes:A preprocessed dataset of labeled spectrogramsCNN-ECA model source code and trained weightsConfiguration and environment files for reproducible researchA manually annotated catalog of whale vocalizationsA full-year model inference catalog of predicted whale vocalizations2. File StructurePathDescriptiondataset/Folder containing preprocessed spectrogram data and labels.├── all_labels.xlsMetadata for all samples (timestamps, labels, data source).├── split_info.xlsSummary of dataset split ratios.├── *_indices.npyIndex files for train/validation/test subsets.├── *_spectrograms.npy3-channel normalized spectrogram arrays for each subset.├── y_*.xlsLabel files for train/validation/test sets.Dataset_S1_annotated_catalog.csvManually annotated catalog of Bryde’s whale vocalizations (see Section 4).Dataset_S2_dl_catalog.csvFull-year inference catalog of predicted whale vocalizations (see Section 5).config.jsonConfiguration file for data paths and hyperparameters.train.pyCNN-ECA model training script.test.pyModel evaluation script.optimized_whale_detector_best.pthTrained model weights (best validation F1-score).requirements.txtPython dependencies for environment setup.3. Dataset DescriptionSampling rate: <b>100 Hz</b>Frequency band: <b>3–20 Hz (Butterworth band-pass filtered)</b>Channels: <b>North, East, Vertical (three-component seismic data)</b>Sample length: <b>10 seconds</b>Data format: <b>Log-scaled, z-score normalized spectrograms</b>Data shape: <b>[N, 3, F, T]</b>Labels<b>1</b> = Bryde’s whale vocalization<b>0</b> = Background / non-vocalizationData Split<b>70%</b> training<b>15%</b> validation<b>15%</b> test4. Manually Annotated CatalogDataset_S1_annotated_catalog.csvThis file contains the manually annotated Bryde’s whale vocalizations used to construct the training dataset.Annotations were performed by visually inspecting spectrograms and identifying characteristic low-frequency pulse sequences associated with Bryde’s whale calls.All timestamps are reported in <b>Beijing Time (UTC+8)</b>.Columns<b>start_time_bjt</b> – Start time of the annotated vocalization segment (Beijing Time, UTC+8)<b>end_time_bjt</b> – End time of the annotated vocalization segment (Beijing Time, UTC+8)<b>duration_s</b> – Duration of the vocalization segment in seconds<b>freqmin_hz</b> – Minimum frequency of the vocalization (Hz)<b>freqmax_hz</b> – Maximum frequency of the vocalization (Hz)Each row corresponds to one annotated Bryde’s whale vocalization event.This catalog forms the <b>ground-truth reference used for dataset construction and model training</b>.5. Full-Year Inference CatalogDataset_S2_dl_catalog.csvThis file contains the deep learning model inference results for the entire year of <b>2021</b>.Using the trained CNN-ECA model (<b>optimized_whale_detector_best.pth</b>), continuous seismic data from January–December 2021 were segmented into <b>10-second windows</b> and processed sequentially. All segments predicted as <b>class 1 (Bryde’s whale vocalization)</b> were extracted and compiled into this catalog.Columns<b>start_time_bjt</b> – Start time of the 10-second segment (Beijing Time, UTC+8)<b>end_time_bjt</b> – End time of the 10-second segment (Beijing Time, UTC+8)Each row corresponds to one <b>10-second segment classified as containing a Bryde’s whale vocalization</b>.This catalog represents the basis for analyzing:<b>Year-round presence</b><b>Seasonal patterns</b><b>Diel (daily) vocal activity rhythms</b>
提供机构:
figshare
创建时间:
2025-10-15



