five

Iguanas from Above - Raw classifications data and resulting Gold Standard counts

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://figshare.com/articles/dataset/Gold_Standard_Dataset_-_Iguanas_from_Above_Project/25196306
下载链接
链接失效反馈
官方服务:
资源简介:
Abstract Population surveys are vital for wildlife management, yet traditional methods are typically effort-intensive, leading to data gaps. Modern technologies — such as drones — facilitate field surveys but increase the data analysis burden. Citizen Science (CS) can alleviate this issue by engaging non-specialists in data collection and analysis. We evaluated this approach for population monitoring using the endangered Galápagos marine iguana as a case study, assessing citizen scientists’ ability to detect and count animals in aerial images. Comparing against a Gold Standard dataset of expert counts in 4345 images, we explored optimal aggregation methods from CS inputs and evaluated the accuracy of CS counts. During three phases of our project — hosted on Zooniverse.org — over 13,000 volunteers made 1,375,201 classifications from 57,838 images; each being independently classified up to 30 times. Volunteers achieved 68% to 94% accuracy in detecting iguanas, with more false negatives than false positives. Image quality strongly influenced accuracy; by excluding data from suboptimal pilot-phase images, volunteers counted with 91% to 92% of accuracy. For detecting iguanas, the standard ‘majority vote' aggregation approach (where the answer selected is that given by the majority of individual inputs) produced less accurate results than when a minimum threshold of five (from the total independent classifications) was used. For counting iguanas, HDBSCAN clustering yielded the best results. We conclude that CS can accurately identify and count marine iguanas from drone images though there is a tendency to underestimate. CS-based data analysis is still resource-intensive, underscoring the need to develop a Machine Learning approach. Methods We created a citizen science project, named Iguanas from Above, in Zooniverse.org. There, we uploaded 'sliced' images from drone imagery belonging to several colonies of the Galápagos marine iguana. Citizen scientists (CS) were asked to classify the images doing two tasks: First to say yes or no for iguana presence in the image and second to count the individuals when present. Each image was classified by 20 or 30 volunteers. Once all the images, corresponding to three phases launched were classified, we downloaded the data from the Zooniverse portal and used the Panoptes Aggregation python package to extract and aggregate CS data (source code: https://github.com/cwinkelmann/iguanas-from-above-zooniverse). We ramdomly selected 5–10% of all the images to create a Gold Standard (GS) dataset. Three experts from the research team identified presence and absence of marine iguanas in the images and count them. The concensus answers are presented in this dataset and is referred as expert data. The aggregated CS data from Task 1 (a total number of yes and no answers per image) was analyzed as accepted for iguana presence when 5 or more volunteers (from the 20–30) selected yes (a minimum threshold rule), otherwise absence was accepted. Then, we compared all CS accepted answers against the expert data, as correct or incorrect, and calculated a percentage of CS accuracy regarding marine iguana detection. For Task 2, we selected all the images identied by the volunteers to have iguanas with this minimum threshold rule and aggregate (summarize) all classifications into one value (count) per image by using the statistical metrics median and mode and the spatial clustering methods DBSCAN and HDBSCAN. The rest of the images obtained 0 counts. CS data was incorporated into this dataset. We then compared total counts in this GS dataset calculated by the expert and all the aggregating methods used in terms of percentages of agreement towards the expert data. These percentages showed CS accuracy regarding marine iguana counting. We also investigated number of marine iguanas under and overestimated with all aggregating methods. Finally, by applying generalized linear models, we used this dataset to explore statistical differences among the different methods used to count marine iguanas (expert, median, mode and HDBSCAN) in the images and how the factors: phase analyzed, quality of the imges (assessed by the experts) and number of marine iguanas present in the image, could affect CS accuracy.

摘要 种群调查对于野生动物管理至关重要,但传统调查方法通常耗时费力,进而导致数据缺口。现代技术(如无人机)可助力野外调查,但却加重了数据分析的负担。公民科学(Citizen Science,CS)可通过吸纳非专业人员参与数据采集与分析来缓解这一难题。本研究以濒危的加拉帕戈斯海鬣蜥(Galápagos marine iguana)为案例对象,对该方法在种群监测中的应用效果进行评估,考察公民科学参与者在航空影像中识别并计数动物的能力。我们以4345张影像的专家计数金标准(Gold Standard)数据集作为参照,探索了基于公民科学输入数据的最优聚合方法,并评估了公民科学计数的准确性。本项目依托Zooniverse.org平台开展,分为三个阶段,累计有超过1.3万名志愿者对57838张影像完成了1375201次标注,每张影像被独立标注多达30次。志愿者在海鬣蜥识别任务中准确率达68%至94%,且假阴性检出率高于假阳性。影像质量对准确率影响显著;剔除试点阶段的低质量影像数据后,志愿者的计数准确率可达91%至92%。在海鬣蜥识别任务中,标准的‘多数投票’聚合方法(即选取多数参与者给出的结果作为最终答案)的准确率低于采用至少5次独立标注为正面的阈值规则的效果。在海鬣蜥计数任务中,HDBSCAN空间聚类算法取得了最优结果。本研究得出结论:尽管公民科学方法存在低估计数的倾向,但仍可通过无人机影像准确识别并计数加拉帕戈斯海鬣蜥。基于公民科学的数据分析仍需耗费大量资源,这凸显了开发机器学习方法的必要性。 研究方法 我们在Zooniverse.org平台搭建了名为“空中鬣蜥(Iguanas from Above)”的公民科学项目,上传了源自多群加拉帕戈斯海鬣蜥的无人机航拍切片影像。参与者需完成两项影像标注任务:其一,判断影像中是否存在海鬣蜥;其二,若存在则计数个体数量。每张影像由20至30名志愿者独立标注。待分三阶段上线的所有影像均完成标注后,我们从Zooniverse门户网站下载了数据,并通过Panoptes聚合Python工具包提取并聚合公民科学标注数据(源代码链接:https://github.com/cwinkelmann/iguanas-from-above-zooniverse)。 我们随机选取了全部影像的5%至10%构建金标准(Gold Standard,GS)数据集。研究团队的三名专家对影像中海鬣蜥的存在与否进行判别并计数,最终的共识结果收录于本数据集,即专家标注数据。针对任务1的聚合数据(即每张影像的“是”与“否”标注总数),我们采用阈值规则判定影像存在海鬣蜥:当20至30名标注志愿者中至少5人选择“存在”时,则判定为有海鬣蜥,反之则判定为无。随后,我们将公民科学得出的判定结果与专家标注数据进行比对,统计正确与错误的比例,以此计算海鬣蜥识别任务中公民科学方法的准确率。 针对任务2,我们选取了通过上述阈值规则判定为存在海鬣蜥的影像,采用统计指标(中位数、众数)以及空间聚类方法(DBSCAN、HDBSCAN)将所有标注结果聚合为每张影像的单一计数值。其余未判定存在海鬣蜥的影像计数为0。本数据集纳入了所有公民科学标注数据。随后,我们将专家计算的总计数与各聚合方法得到的总计数进行比对,以与专家数据的吻合度百分比作为海鬣蜥计数任务中公民科学方法的准确率指标。同时,我们还分析了所有聚合方法下海鬣蜥计数的低估与高估情况。 最后,我们运用广义线性模型,基于本数据集探究了不同海鬣蜥计数方法(专家标注、中位数、众数以及HDBSCAN聚类)之间的统计差异,同时分析了影像分析阶段、影像质量(由专家评定)以及影像中海鬣蜥个体数量等因素对公民科学方法准确率的影响。
创建时间:
2024-02-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作