Gold Standard Dataset - Iguanas from Above Project
收藏DataCite Commons2024-04-22 更新2024-08-19 收录
下载链接:
https://figshare.com/articles/dataset/Gold_Standard_Dataset_-_Iguanas_from_Above_Project/25196306/1
下载链接
链接失效反馈官方服务:
资源简介:
AbstractPopulation surveys are vital for wildlife management, yet traditional methods often demand excessive time and resources, leading to data gaps for many species. Modern technologies such as drones can facilitate field surveys, but may also increase data analysis challenges. Citizen Science (CS) can address this issue by engaging non-specialists for data collection and analysis. We assess the applicability of CS for population monitoring using the endangered Galápagos marine iguana as a case study, analysing online volunteers' ability to detect and count animals in aerial images. Comparing against a Gold Standard dataset — comprising a consensus of expert counts in 4345 images — we investigated which aggregation methods produced optimal results from CS inputs, as well as the influence of image quality and filtering data from infrequent and anonymous participants. During three phases of our project — hosted on the Zooniverse platform — over 13,000 volunteers made 1,375,201 classifications from 57,838 aerial images; each image being independently classified 20 (phase 1 & 2) or 30 (phase 3) times. Volunteers achieved 68% to 94% accuracy in detecting iguanas, with more false negatives than false positives. Image quality strongly influenced accuracy; by excluding data from suboptimal pilot-phase images, volunteers counted with 90% to 92% of accuracy. For detecting presence or absence of iguanas, the commonly used ‘majority vote' aggregation approach (where the answer selected by the majority of individual inputs) produced less accurate results than when a minimum threshold of five (from the 20/30 independent classifications) was used. For aggregating results on iguana counts, the HDBSCAN clustering method yielded the best results. Removing inputs from anonymous and inexperienced volunteers reduced accuracy, emphasizing the importance of considering all volunteer contributions. We conclude that with sufficiently good aerial images, online volunteers can accurately identify and count marine iguanas from drone images, though a tendency to underestimate warrants further consideration. Finally, although CS-based data analysis is quicker than manual counting, it still requires significant time resources, thus we recommend the development of a Machine Learning approach to address this issue.MethodsWe created a citizen science project, named Iguanas from Above, in Zooniverse.org. There, we uploaded 'sliced' images from drone imagery belonging to several colonies of the Galápagos marine iguana. Citizen scientists (CS) were asked to classify the images doing two tasks: First to say yes or no for iguana presence in the image and second to count the individuals when present. Each image was classified by 20 or 30 volunteers. Once all the images, corresponding to three phases launched were classified, we downloaded the data from the Zooniverse portal and used the Panoptes Aggregation python package to extract and aggregate CS data.We ramdomly selected 5–10% of all the images to create a Gold Standard (GS) dataset. Three experts from the research team identified presence and absence of marine iguanas in the images and count them. The concensus answers are presented in this dataset and is referred as expert data. The aggregated CS data from Task 1 (a total number of yes and no answers per image) was analyzed as accepted for iguana presence when 5 or more volunteers (from the 20–30) selected yes (a minimum threshold rule), otherwise absence was accepted. Then, we compared all CS accepted answers against the expert data, as correct or incorrect, and calculated a percentage of CS accuracy regarding marine iguana detection.For Task 2, we selected all the images identied by the volunteers to have iguanas with this minimum threshold rule and aggregate (summarize) all classifications into one value (count) per image by using the statistical metrics median and mode and the spatial clustering methods DBSCAN and HDBSCAN. The rest of the images obtained 0 counts. CS data was incorporated into this dataset. We then compared total counts in this GS dataset calculated by the expert and all the aggregating methods used in terms of percentages of agreement towards the expert data. These percentages showed CS accuracy regarding marine iguana counting. We also investigated number of marine iguanas under and overestimated with all aggregating methods.Finally, by applying generalized linear models, we used this dataset to explore statistical differences among the different methods used to count marine iguanas (expert, median, mode and HDBSCAN) in the images and how the factors: phase analyzed, quality of the imges (assessed by the experts) and number of marine iguanas present in the image, could affect CS accuracy.
提供机构:
figshare
创建时间:
2024-02-09



