Large-scale, Stratified, Fully Annotated Acoustic Forest Soundscape Dataset of Avian Vocalizations from Eastern North America
收藏DataCite Commons2026-05-06 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.18041380
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains 1302 10-minute soundscape recordings that have been annotated by expert ornithologists, resulting in approximately ~183,000 vocalizations labelled for 96 bird species from the Northeastern USA. The data were recorded at 104 sites in four parks: Acadia National Park - Bar Harbor, Maine (ACAD, n=31), Hubbard Brook Experimental Forest - in White Mountains National Forest, New Hampshire (HBEF, n=24), Katahdin Woods and Waters National Monument - Millinocket, Maine (KAWW, n=20), and Marsh-Billings-Rockefeller National Historical Park - Woodstock, Vermont (MABI, n=25) in 2022 and 2023. These datasets are intended to facilitate reproducible research and to support the development and evaluation of automated bioacoustic analysis methods across ecology and machine learning. Additionally, these high-quality annotated datasets can be used to address other ecological and behavioral questions, including evaluating geographic variation in bird song types and song production, the temporal relationships between intra- and interspecific vocalizing individuals, and the phenology of singing behavior across species.
Data collection
We deployed SwiftOne recorders at 104 sites across four parks in the northeastern USA during breeding season (May-Jul) in 2022 and 2023. The ARU unit had one omnidirectional microphone with a sensitivity of -25 dB re 1V/Pa and a signal-to-noise ratio of 62 dB fitted with a wind protector (WindTech 10380 military high-density windscreen). The recordings were made at a sampling rate of 32 kHz. The analog signal was amplified by 33 dB using a preamplifier prior to digitization. It was then converted at 16-bit resolution, with the analog-to-digital converter (ADC) clipping at an instantaneous input voltage of ±0.9 V. This ongoing study aims to investigate the vocal activity patterns and seasonally changing diversity of local bird species. Recordings were collected at 5 hours (05:00 - 10:00 Eastern Daylight Time [EDT]) in the morning and 1 hour in the evening (19:30 - 20:30 EDT) as uncompressed 1-hour WAV files at 32 kHz, converted to FLAC. We then extracted 1302 10-minute recordings from this collection.
Sampling and annotation protocol
We provide a collection of 10-minute recordings as three datasets: DatasetSIMR, DatasetMABI, and DatsetACAD. For DatasetSIMR dataset, we selected 429 10-minute recordings corresponding to point-count surveys conducted concurrently in the field between 5-10 am at 104 sites across all four parks in 2022 and 2023. For the ACAD (n = 396) and MABI (n = 477) datasets, we extracted 10-minute recordings 40 minutes after the local sunrise on clear days. These clear days were selected during the peak breeding season (15 May- 7 Jul) with the goal of selecting approximately 20 mornings per site. At a few sites, we had fewer than 20 recordings due to equipment failure. For the MABI dataset, 477 10-minute recordings were collected at 24 sites in 2022. For the ACAD dataset, 396 10-minute recordings were collected from 23 sites, with recordings in 2022 (2 sites) and 2023 (21 sites).Annotators created an annotation box around every vocalization they could recognize, ignoring those that were too faint or unidentifiable. Raven Pro 1.6 software was used to annotate the data. The provided labels contain full bird vocalizations, boxed in time and frequency space. Annotators were allowed to combine multiple consecutive calls of one species into one bounding box label if pauses between calls were shorter than two seconds. We used the standard four-letter code for bird species in accordance with the 65th AOU supplement (Chesser et al., 2025).
We annotated the recordings in a 3-step process. In the first round, annotators were instructed to annotate all visible vocalizations and assign species-level identification. However, due to differences in annotator experience, there was some variation in which acoustic events were annotated versus left unannotated. Although subsequent review rounds improved consistency and added annotations that were previously missed, faint or ambiguous vocalizations may remain unannotated. The second round focused on confirming uncertain identifications, reviewing species identifications, and adding any missed vocalizations from the first round. The third round aimed to verify all annotations, add any remaining missed vocalizations, and ensure consistency and completeness. The third round was performed by an experienced ornithologist who was not involved with the first two rounds of annotations to improve the overall correctness and completeness of annotations. The first round of annotations was performed by a mix of amateur and experienced birders, while the second and third rounds were reviewed exclusively by experienced local birders and ornithologists. Despite a rigorous multi-stage annotation process, the final dataset may still contain a small number of unlabelled sounds of birds and mammals. These omissions are inherent to the manual review of large-scale recordings, where high-density choruses or faint, distant signals may elude detection.
Files in this collection
Please read the ReadMe.txt file for a complete description of each file included in the collection. All 10-minutere recordings for each dataset can be accessed by downloading and extracting the corresponding recordings zip file (e.g., DatasetSIMR_Recordings.zip). These recording filenames contain a file ID, site (recording location), date, and timestamp in EST. As an example, the file “6001.41.01x.ACAD3002_20220519_050000.flac” has file ID 6001 recorded at site "ACAD3002" on 19th May 2022 at 05:41:00 EDT.
Ground truth annotation text files generated using Raven Pro 1.6 for each 10-minute recording in each dataset can be downloaded and extracted from the corresponding annotations zip file (e.g., DatasetSIMR_Annotations.zip). Each row of the annotation text file represents one boxed vocalization specifying the start and end time in seconds, low and high frequency in Hertz, a 4-letter AOU species code, and the corresponding recording_filename. See data_dictionary for more details on the description of each field in the text file. These species codes can be assigned to the scientific and common names of a species with the “species.csv” file.
The spatial information for each site (longitude and latitude) is provided in the “site_metadata.txt” file. Includes site and park identifiers, geographic coordinates (WGS84 latitude and longitude), the number of 10-minute recordings collected at the site, and indicators of whether the site was recorded in 2022 and 2023. Each row of "recordings_metadata.csv" specifies one 10-minute recording with associated information such as recording_filename, annotation_filename, a unique item identifier (itemID), dataset identifier (dataset_ID: DatasetACAD, DatasetMABI, DatsasetSIMR), site identifier (siteID), date (YYYYMMDD), time (HHMM), year (YYYY), and Park Identifier (Park_ID). See "data_dictionary.csv" for further description of fields across CSV files. We also provide a test audio file designed for pre-deployment microphone testing (AudioTestFileforRecorders.wav). It contains pure tones from 500 to 9,500 Hz at 1-kHz intervals, with five amplitude levels per frequency.
Acknowledgements
Compiling this extensive dataset was a significant undertaking, and we are grateful to the domain experts who helped collect and manually annotate the data for this collection (individual contributors in alphabetical order): Gillian Audier, Kyle Burton, Jack Bushong, Brooke D. Goodman, Alexander Harris, Brian Hofstetter, Lakshmi Meghana Kesanapalli, Ethan Reilly, and Ed Sharron.
提供机构:
Zenodo
创建时间:
2026-02-14



