Data from: A demonstration of unsupervised machine learning in species delimitation
收藏DataCite Commons2025-06-01 更新2025-05-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.nj2mg77
下载链接
链接失效反馈官方服务:
资源简介:
One major challenge to delimiting species with genetic data is
successfully differentiating population structure from species-level
divergence, an issue exacerbated in taxa inhabiting naturally fragmented
habitats. Many fields of science are now using machine learning, and in
evolutionary biology supervised machine learning has recently been used to
infer species boundaries. These supervised methods require training data
with associated labels. Conversely, unsupervised machine learning (UML)
uses inherent data structure and does not require user-specified training
labels, potentially providing more objectivity in species delimitation.
Here we demonstrate the utility of three UML approaches (random forests,
variational autoencoders, t-distributed stochastic neighbor embedding) for
species delimitation in an arachnid taxon with high population genetic
structure (Opiliones, Laniatores, Metanonychus). We find that UML
approaches successfully cluster samples according to species-level
divergences and not high levels of population structure, while model-based
validation methods severely over-split putative species. UML offers
intuitive data visualization in two-dimensional space, the ability to
accommodate various data types, and has potential in many areas of
systematic and evolutionary biology. We argue that machine learning
methods are ideally suited for species delimitation and may perform well
in many natural systems and across taxa with diverse biological
characteristics.
提供机构:
Dryad
创建时间:
2019-07-05



