A collection of fully-annotated soundscape recordings from neotropical coffee farms in Colombia and Costa Rica
收藏Mendeley Data2024-05-10 更新2024-06-27 收录
下载链接:
https://zenodo.org/records/7525349
下载链接
链接失效反馈官方服务:
资源简介:
This collection contains 34 hour-long soundscape recordings, which have been annotated by expert ornithologists who provided 6,952 bounding box labels for 89 different bird species from Colombia and Costa Rica. The data were recorded in 2019 at two highly diverse neotropical coffee farm landscapes from the towns of Jardín, Colombia and San Ramon, Costa Rica. This collection has partially been featured as test data in the 2021 BirdCLEF competition and can primarily be used for training and evaluation of machine learning algorithms. Data collection Monitoring the avifauna of coffee farms is a useful tool to measure the impact of sustainability efforts in productive landscapes. Diverse bird communities provide services to coffee farms that are enhanced with increased tree cover (e.g. shade trees, wind breaks), and the protection of forest remnants within or in close proximity to coffee farms. Our limited knowledge on the relative contributions of different types of tree cover has incentivized the scientific community to collect bird data in coffee landscapes in order to correlate different management strategies with the presence or absence of target species using bioacoustics. To accomplish this, we collected a set of bird recordings to track the presence of target species on coffee farms. The annotated data set is currently being used to measure the impact of pesticide applications and other types of management practice (e.g. pruning) to test for differences in bird activity before and after one of these interventions. In addition, the bird call annotations help us train machine learning models that will help us monitor these farms in an automatic way. Soundscapes for this collection were recorded using SWIFT recorders, positioned 3m above the ground. We recorded 48kHz one-hour long sound files from 4:30 to 7:30 to capture the most active time frame of the avian dawn chorus. In the same way, we recorded from 16:00 to 19:00, to capture the second avian bioacoustics activity peak that occurs before sunset, and to include nocturnal species. All audio was unified, converted to FLAC, and resampled to 32 kHz for this collection. Parts of this dataset have previously been used in the 2021 BirdCLEF competition. Sampling and annotation protocol We subsampled data for this collection by randomly selecting recordings coming from different farm locations and dates. Using Raven Pro, annotators were asked to create a selection box around every bird call they could recognize, ignoring those that were too faint or unidentifiable. We allowed overlapping selections. Provided labels contain full bird calls that are boxed in time and frequency. Annotators were allowed to combine multiple consecutive calls of the same species into one bounding box label if pauses between calls were shorter than 5 seconds. We converted labels to eBird species codes, following the 2021 eBird taxonomy (Clements list). Unidentifiable calls have been marked with “????” and were added as bounding box labels to the ground truth annotations. Files in this collection Audio recordings can be accessed by downloading and extracting the “soundscape_data.zip” file. Soundscape recording filenames contain a sequential file ID, site ID, recording date, and timestamp in local time (Costa Rica: GMT-6; Colombia: GMT-5). As an example, the file “NES_001_S01_20190914_043000.flac” has sequential ID 001 and was recorded at site S01 on Sep 14th, 2019 at 04:30:00 local time. Ground truth annotations are listed in “annotations.csv” where each line specifies the corresponding filename, start and end time in seconds, low and high frequency in Hertz, and an eBird species code. These species codes can be assigned to the scientific and common name of a species with the “species.csv” file. Geographical coordinates of San Ramon and Jardín regions can be found in the “recording_location.txt” file. Recording location coordinates are not included due to data privacy. Acknowledgements Compiling this extensive dataset was a major undertaking, and we are very thankful to the domain experts who helped to collect and manually annotate the data for this collection. Specifically, we want to thank (in alphabetical order): Alejandro Quesada, José Castaño, Luis Parra. We also thank Carlos Gamboa-Venegas (RedCONARE, CeNAT), for providing technical assistance for information transfer. We would also like to acknowledge our funding source: Nespresso AAA Sustainable QualityTM Program.
本数据集包含34段时长为1小时的声景录音,由鸟类学专家完成标注,共为来自哥伦比亚与哥斯达黎加的89种鸟类提供了6952个边界框标签。这些数据于2019年在两处高度多样的新热带区咖啡种植园景观中采集,点位分别为哥伦比亚的雅尔丁(Jardín)镇与哥斯达黎加的圣拉蒙(San Ramón)镇。本数据集的部分内容曾作为测试数据参与2021年BirdCLEF竞赛,主要可用于机器学习算法的训练与评估。
### 数据采集背景与用途
监测咖啡种植园的鸟类区系,是评估生产性景观中可持续举措影响的有效手段。多样的鸟类群落可为咖啡种植园提供生态服务,而增加林木覆盖(如遮阴树、防风林)以及保护咖啡种植园内或邻近的森林残余斑块,可强化这类服务功能。目前我们对不同类型林木覆盖的相对贡献认知有限,这促使科学界在咖啡景观中采集鸟类数据,以期通过生物声学手段,将不同管理策略与目标物种的存在与否建立关联。为达成这一目标,我们采集了一批鸟类录音,以追踪咖啡种植园内目标物种的出现情况。
本标注数据集目前被用于评估农药施用及其他管理措施(如修剪)的影响,以检验这些干预措施实施前后鸟类活动的差异。此外,鸟类鸣唱标注可辅助我们训练机器学习模型,从而实现对这些种植园的自动化监测。
本数据集的声景录音采用SWIFT录音设备采集,设备安装于距地面3米处。我们在每日4:30至7:30时段录制时长为1小时的48kHz音频文件,以捕捉鸟类黎明鸣唱最为活跃的时间段。同理,我们在每日16:00至19:00时段进行录制,以捕捉日落前出现的第二轮鸟类生物声学活动高峰,并覆盖夜行性鸟类物种。本数据集的所有音频均经过统一处理,转换为FLAC格式并重采样至32kHz。本数据集的部分内容此前已被用于2021年BirdCLEF竞赛。
### 采样与标注规范
我们通过随机选取来自不同种植园点位与不同日期的录音,对本数据集进行二次采样。标注人员使用Raven Pro软件,对所有可识别的鸟类鸣唱绘制选择框,忽略过于模糊或无法识别的鸣唱,且允许选择框存在重叠。提供的标签涵盖了在时间与频率维度上被框选的完整鸟类鸣唱:若同一物种的鸣唱之间间隔短于5秒,标注人员可将多个连续的鸣唱合并为一个边界框标签。我们遵循2021年eBird分类体系(Clements名录),将标签转换为eBird物种代码。无法识别的鸣唱将以“????”标记,并作为边界框标签加入真实标注集。
### 数据集文件说明
可通过下载并解压"soundscape_data.zip"文件获取音频录音。声景录音的文件名包含顺序文件ID、点位ID、录制日期以及当地时间戳(哥斯达黎加:GMT-6;哥伦比亚:GMT-5)。例如,文件"NES_001_S01_20190914_043000.flac"的顺序ID为001,于2019年9月14日当地时间04:30:00在S01点位录制。
真实标注集存储于"annotations.csv"文件中,每行对应相关文件名、以秒为单位的起始与结束时间、以赫兹为单位的低频与高频阈值,以及eBird物种代码。可通过"species.csv"文件将物种代码映射为物种的学名与常用名。圣拉蒙与雅尔丁地区的地理坐标可在"recording_location.txt"文件中查询。由于数据隐私要求,未提供录制点位的具体坐标。
### 致谢
构建本大规模数据集是一项繁重的工作,我们衷心感谢协助本数据集数据采集与人工标注的领域专家。特别鸣谢(按字母顺序排列):Alejandro Quesada、José Castaño、Luis Parra。同时感谢Carlos Gamboa-Venegas(RedCONARE、CeNAT)为信息传递提供技术支持。我们还感谢资助方:奈斯派索(Nespresso)AAA可持续品质TM计划。
创建时间:
2023-06-28



