Re-evaluating deep neural networks for phylogeny estimation: the issue of taxon sampling
收藏DataCite Commons2025-06-01 更新2025-06-15 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.rbnzs7h91
下载链接
链接失效反馈官方服务:
资源简介:
Deep neural networks (DNNs) are powerful machine learning models that are
widely used for classification problems, and have been recently proposed
for quartet tree phylogeny estimation (Survorov et al. Systematic Biology
2020 and Zou et al. Molecular Biology and Evolution 2020). Here we present
a study evaluating recently trained DNNs (from Zou et al., MBE 2020) in
comparison to a collection of standard phylogeny estimation methods,
including UPGMA, neighbor joining, maximum parsimony, and maximum
likelihood, on a heterogeneous collection of 20-sequence datasets
simulated under the same models that were used to train the DNNs, and also
under similar conditions but with higher rates of evolution. Our study
shows that using DNNs with quartet amalgamation (to combine quartet trees
into a tree on the full dataset) is only more accurate than UPGMA, and
otherwise is less accurate than all standard phylogeny estimation methods
we explore (maximum likelihood, neighbor joining, and maximum parsimony).
We further find that while DNNs can provide good quartet tree accuracy,
some standard phylogeny estimation methods match or improve on DNNs for
quartet accuracy, especially, but not exclusively, when used in a global
manner (i.e., the tree on the full dataset is computed and then the
induced quartet trees are extracted from the full tree). Thus, our study
provides evidence that a major challenge impacting the utility of current
DNNs for phylogeny estimation is their restriction to estimating quartet
trees which must subsequently be combined into a tree on the full dataset:
in contrast, global methods -- i.e., those that estimate trees from the
full set of sequences -- are able to benefit from taxon sampling, and
hence have higher accuracy on large datasets.
提供机构:
Dryad
创建时间:
2020-08-27



