five

Dataset for: Modelling imperfect presence data obtained by citizen science

收藏
Mendeley Data2024-06-29 更新2024-06-29 收录
下载链接:
https://wiley.figshare.com/articles/dataset/Dataset_for_Modelling_imperfect_presence_data_obtained_by_citizen_science/4818013/1
下载链接
链接失效反馈
官方服务:
资源简介:
There is growing awareness about the potential benefit of harnessing citizen science for research, particularly in the biological and environmental sciences. Data quality is a major constraint in the use of citizen-science data; in particular, imperfect observations. In this paper we fit species distribution models (SDMs) to presence-only data (presences and counts, with no absences observed) by exploiting the uncertainty in reported presences, instead of generating pseudo-absences as is common in previous presence-only studies. This approach allowed us to extend the suite of models to include those commonly fit to presence/absence and abundance data. We fit several models to a case study dataset of jaguar encounters reported by citizens in the Peruvian Amazon. The true species distribution for the case study data is unknown, and so we also undertake an extensive simulation study to evaluate model performance. We analyze the sources of error by studying the bias and variance of the models, and also discuss the predictive performance of each model and its ability to recover the true species distribution. The simulation study shows that although several approaches are capable of recovering the species distribution, the choice of a modelling approach is a complex one, and depends on factors such as inferential aim, model complexity, sample size and computational resources. This study also addresses some issues in dealing with compound-imperfect observations arising from citizen-science data, and we discuss further steps needed in this research area.

当前,学界对利用公民科学开展研究的潜在价值认知日益加深,这一点在生物与环境科学领域尤为突出。公民科学数据的应用存在数据质量这一核心制约因素,其中不完善的观测记录尤为显著。本文针对仅存在数据(presence-only data,即仅记录物种存在情况与种群数量、未观测到物种缺失的数据集)构建物种分布模型(species distribution models, SDMs),通过利用报告存在记录的不确定性,而非如既往多数仅存在数据研究那样生成伪缺席数据(pseudo-absences)。该方法使我们得以将模型集拓展至通常适用于存在/缺失数据与丰度数据的模型类型。我们针对秘鲁亚马逊地区民众报告的美洲豹(jaguar)观测案例数据集,拟合了多款模型。由于该案例数据集对应的真实物种分布尚未明确,我们同时开展了大规模模拟研究(simulation study)以评估模型性能。我们通过分析模型的偏差(bias)与方差(variance)探究误差来源,同时讨论了各模型的预测性能及其还原真实物种分布的能力。模拟研究结果表明,尽管多款方法均可还原物种分布,但建模方法的选择颇具复杂性,需依据推断目标、模型复杂度、样本量与计算资源等因素确定。本研究还解决了公民科学数据中复合不完善观测的部分处理难题,并讨论了该研究领域仍需推进的后续工作方向。
创建时间:
2023-06-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作