Deep convolutional neural network for owl vocal identification
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/3338549
下载链接
链接失效反馈官方服务:
资源简介:
This repository contains all the code and data necessary to replicate the results presented in Ruff et al. 2019, "Automated identification of avian vocalizations with deep convolutional neural networks", and is published in support of that manuscript. The folder includes several Python scripts, our trained convolutional neural network (CNN), and a set of 164,210 spectrogram images that were reviewed to generate CNN performance metrics. We include the CNN's predicted class scores for the test images as well as the set of labels assigned to the same images by experienced human technicians. The published article can be found here: https://zslpublications.onlinelibrary.wiley.com/doi/full/10.1002/rse2.125
As presented, the CNN is designed to accept grayscale PNG images at 500x129 resolution and will generate a set of seven class scores for each image. Class scores are the softmax activation from the final (seven unit) fully-connected layer of the CNN. Scores are bounded between 0 and 1 and sum to 1 for each image. This means target classes are implicitly treated as mutually exclusive (i.e., each image belongs to exactly one class), although in reality some images contain calls from >1 target species.
The different scripts and their functions are as follows:
- Code used to construct and train the CNN is in Owl_CNN_train_model.py
- Code to generate spectrograms with randomized parameters based on tagged calls in audio files is in Owl_CNN_generate_training_data.py
- Code to generate random spectrograms from a set of audio files (used to generate training data for the Noise class) can be generated with Owl_CNN_make_noise_data.py
- Code used to process raw audio files, including segmenting them into 12 s clips, generating spectrograms, and generating class scores using a pre-trained CNN is in Owl_CNN_process_audio.py
- Code to generate class scores for an existing set of spectrogram images using a pre-trained CNN are in Owl_CNN_process_images.py
Our seven target classes are as follows:
AEAC - Northern saw-whet owl, Aegolius acadicus.
BUVI - Great horned owl, Bubo virginianus.
GLGN - Northern pygmy-owl, Glaucidium gnoma.
MEKE - Western screech-owl, Megascops kennicottii.
STOC - (Northern) spotted owl, Strix occidentalis caurina.
STVA - Barred owl, Strix varia.
Noise - Catch-all for any clip that did not contain vocalizations of at least one of the six owl species listed above.
The CNN was trained for 100 epochs and saved only after epochs in which validation loss improved. Loss was measured as categorical cross-entropy. The CNN was last saved at epoch 97 with reported metrics:
Training loss = 0.218
Training accuracy = 0.972
Validation loss = 0.165
Validation accuracy = 0.987
Although this code has been tested and works on our system, we make no guarantee that it will work for others without modification. Created using Python version 2.7.14, TensorFlow version 1.2.1, Keras version 2.2, and SoX version 14.4. Code was developed by Bharath Padmaraju, Zack Ruff, and Chris Sullivan. Questions and comments may be directed to zjruff at gmail dot com.
Zack Ruff
15 July 2019
创建时间:
2020-01-24



