Modelling heterogeneity in the classification process in multi-species distribution models can improve predictive performance
收藏DataCite Commons2025-06-01 更新2025-04-09 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.0rxwdbs51
下载链接
链接失效反馈官方服务:
资源简介:
Species distribution models and maps from large-scale biodiversity data
are necessary for conservation management. One current issue is that
biodiversity data are prone to taxonomic misclassifications. Methods to
account for these misclassifications in multispecies distribution models
have assumed that the classification probabilities are constant throughout
the study. In reality, classification probabilities are likely to vary
with several covariates. Failure to account for such heterogeneity can
lead to biased prediction of species distributions. Here, we
present a general multispecies distribution model that accounts for
heterogeneity in the classification process. The proposed model assumes a
multinomial generalized linear model for the classification confusion
matrix. We compare the performance of the heterogeneous classification
model to that of the homogeneous classification model by assessing how
well they estimate the parameters in the model and their predictive
performance on hold-out samples. We applied the model to gull data from
Norway, Denmark, and Finland, obtained from the Global Biodiversity
Information Facility. Our simulation study showed that accounting for
heterogeneity in the classification process increased the precision of
true species' identity predictions by 30 % and accuracy and recall by
6%. Since all the models in this study accounted for misclassification of
some sort, there was no significant effect of accounting for heterogeneity
in the classification process on the inference about the ecological
process. Applying the model framework to the gull dataset did not improve
the predictive performance between the homogeneous and heterogeneous
models (with parametric distributions) due to the smaller misclassified
sample sizes. However, when machine learning predictive scores were used
as weights to inform the species distribution models about the
classification process, the precision increased by 70%. We recommend
multiple multinomial regression to be used to model the variation in the
classification process when the data contains relatively larger
misclassified samples. Machine prediction scores should be used when the
data contains relatively smaller misclassified samples.
提供机构:
Dryad
创建时间:
2024-03-05



