specleanr: An R package for automated flagging of environmental outliers in ecological data for modeling workflows
收藏DataCite Commons2026-01-29 更新2026-04-25 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.6m905qgd7
下载链接
链接失效反馈官方服务:
资源简介:
Developing species distribution models (SDM) requires high-quality species
occurrence records. These records, stemming from varying sources with
different sampling procedures, are often archived in open-access
databases, which makes automated data quality checks inevitable. Temporal,
geographic, and taxonomic quality checks are usually conducted in SDM
workflows, but checking for records distant in environmental space, i.e.,
outliers, is often ignored. Here, we present specleanr, an R package that
contains 20 outlier detection methods (ODMs) that can be ensembled to
identify potential outliers in environmental predictors. These methods are
categorized into (i) species-specific ecological range, (ii) univariate,
and (iii) multivariate ODMs. All potential outliers flagged from the
different methods are pooled to identify absolute outliers (records
appearing in multiple methods). The local regression (LOESS) method is
then used to automatically set a threshold that optimally identifies the
absolute outliers. Also, clustering records into poor, fair, moderate,
very strong, perfect outliers, and non-outliers is possible, based on each
record's likelihood as a potential outlier, which allows expert
assessment. We demonstrated the approach to 15 fish species from the
Danube River Basin, including native, alien, threatened, and common
species. We fitted SDMs using bioclimatic and hydromorphological
parameters. We compared the model Area Under the Curve (AUC) before and
after outlier removal using three scenarios: (1) the LOESS method, (2)
removing very strong outliers, and (3) removing perfect outliers. The
results showed a significant improvement in the model AUC with generally
small to moderate effect sizes after outlier removal. specleanr is
generalizable across taxonomic groups, data types, ecological realms, and
geographic regions. Beyond SDM, it can also be broadly used in general
data analysis where outlier detection is essential. We provide vignettes
to support the package use. specleanr offers a user-friendly and
reproducible approach for handling outliers in biogeographical modeling
and general data analysis workflows.
提供机构:
Dryad
创建时间:
2025-11-04



