Data for: Inadequate sampling of the soundscape leads to overoptimistic estimates of recogniser performance: A case study of two sympatric macaw species

DataONE2023-01-13 更新2025-08-02 收录

下载链接：

https://search.dataone.org/view/sha256:f6db4c34d457a0e9f58cc58225c9fda266bba3e2d00b318e6c2e959898ef20b4

下载链接

链接失效反馈

官方服务：

资源简介：

Passive acoustic monitoring (PAM) offers the potential to dramatically increase the scale and robustness of species monitoring in rainforest ecosystems. PAM generates large volumes of data that require automated methods of target species detection. Species-specific recognisers, which often use supervised machine learning, can achieve this goal. However, they require a large training dataset of both target and non-target signals, which is time-consuming and challenging to create. Unfortunately, very little information about creating training datasets for supervised machine learning recognisers is available, especially for tropical ecosystems. Here we show an iterative approach to creating a training dataset that improved recogniser precision from 0.12 to 0.55. By sampling background noise using an initial small recogniser, we addressed one of the significant challenges of training dataset creation in acoustically diverse environments. Our work demonstrates that recognisers will likely f..., Raw data used to create this dataset was collected from autonomous recording units in northern Costa Rica.Â A template-matching process was used to identify candidate signals, then a one-second window was put around each candidate signal. We extracted a total of 113 acoustic features using the warbler package in R (R Core Team, 2020): 20 measurements of frequency, time, and amplitude parameters, and 93 Mel-frequency cepstral coefficients (MFCCs) (ArayaâSalas and SmithâVidaurre, 2017). This dataset also includes the results of manually checking detections that were the output of a trained random forest. These were initially output as selection tables, individual sound files were loaded in Raven Lite, selection tables were loaded, and each detection was manually checked and labelled. There is also the random forest model, which is a .rds format model created using tidymodels in R.Â , Following the code associated with this data will require R; the outputs from the machine learning require Raven Lite to open. The raw recordings are not included in this dataset.

创建时间：

2025-07-21