Data from: More on the best evolutionary rate for phylogenetic analysis
收藏DataCite Commons2025-04-01 更新2025-04-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.s342d
下载链接
链接失效反馈官方服务:
资源简介:
The accumulation of genome-scale molecular datasets for non-model taxa
brings us ever closer to resolving the tree of life of all living
organisms. However, despite the depth of data available, a number of
studies that each used thousands of genes have reported conflicting
results. The focus of phylogenomic projects must thus shift to more
careful experimental design. Even though we still have a limited
understanding of what are the best predictors of the phylogenetic
informativeness of a gene, there is wide agreement that one key factor is
its evolutionary rate; but there is no consensus as to whether the rates
derived as optimal in various analytical, empirical, and simulation
approaches have any general applicability. We here use simulations to
infer optimal rates in a set of realistic phylogenetic scenarios with
varying tree sizes, numbers of terminals, and tree shapes. Furthermore, we
study the relationship between the optimal rate and rate-variation among
sites and among lineages. Finally, we examine how well the predictions
made by a range of experimental-design methods correlate with the observed
performance in our simulations. We find that the optimal level of
divergence is surprisingly robust to differences in taxon sampling and
even to among-site and among-lineage rate variation as often encountered
in empirical datasets. This finding encourages the use of methods that
rely on a single optimal rate to predict a gene’s utility. Focusing on
correct recovery either of the most basal node in the phylogeny or of the
entire topology, the optimal rate is about 0.45 substitutions from root to
tip in average Yule trees and about 0.2 in difficult trees with short
basal and long apical branches, but all rates leading to divergence levels
between about 0.1 and 0.5 perform reasonably well.Testing the performance
of six methods that can be used to predict a gene’s utility against our
simulation results, we find that the probability of resolution,
signal-noise analysis, and Fisher information are good predictors of
phylogenetic informativeness, but they require specification of at least
part of a model tree. Likelihood quartet mapping also shows very good
performance, but only requires sequence alignments and is thus applicable
without making assumptions about the phylogeny. Despite them being the
most commonly used methods for experimental design, geometric quartet
mapping and the integration of phylogenetic informativeness curves perform
rather poorly in our comparison. Instead of derived predictors of
phylogenetic informativeness, we suggest that the number of sites in a
gene that evolve at near-optimal rates (as inferred here) could be used
directly to prioritize genes for phylogenetic inference. In combination
with measures of model fit, especially with respect to compositional
biases and among-site and among-lineage rate variation, such an approach
has the potential to greatly improve marker choice and should be tested on
empirical data.
提供机构:
Dryad
创建时间:
2017-05-25



