Evaluating the performance of probabilistic algorithms for phylogenetic analysis of big morphological datasets: a simulation study
收藏DataCite Commons2025-05-01 更新2025-05-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.36cq8k2
下载链接
链接失效反馈官方服务:
资源简介:
Reconstructing the tree of life is an essential task in evolutionary
biology. It demands accurate phylogenetic inference for both extant and
extinct organisms, the latter being almost entirely dependent on
morphological data. While parsimony methods have traditionally dominated
the field of morphological phylogenetics, a rapidly growing number of
studies are now employing probabilistic methods (maximum likelihood and
Bayesian inference). The present-day toolkit of probabilistic methods
offers varied software with distinct algorithms and assumptions for
reaching global optimality. However, benchmark performance assessments of
different software packages for the analyses of morphological data,
particularly in the era of big data, are still lacking. Here, we test the
performance of four major probabilistic software under variable taxonomic
sampling and missing data conditions: the Bayesian inference-based
programs MrBayes and RevBayes, and the maximum likelihood-based IQ-TREE
and RAxML. We evaluated software performance by calculating the distance
between inferred and true trees using a variety of metrics, including
Robinson-Foulds (RF), Matching Splits (MS), and Kuhner-Felsenstein (KF)
distances. Our results show that increased taxonomic sampling improves
accuracy, precision, and resolution of reconstructed topologies across all
tested probabilistic software applications and all levels of missing data.
Under the RF metric, Bayesian inference applications were the most
consistent, accurate, and robust to variation in taxonomic sampling in all
tested conditions, especially at high levels of missing data, with little
difference in performance between the two tested programs. The MS metric
favored more resolved topologies that were generally produced by IQ-TREE.
Adding more taxa dramatically reduced performance disparities between
programs. Importantly, our results suggest that the RF metric penalizes
incorrectly resolved nodes (false positives) more severely than the MS
metric, which instead tends to penalize polytomies. If false positives are
to be avoided in systematics, Bayesian inference should be preferred over
maximum likelihood for the analysis of morphological data.
提供机构:
Dryad
创建时间:
2020-03-23



