Data for: Theoretical and practical considerations when using retroelement insertions to estimate species trees in the anomaly zone
收藏DataCite Commons2025-06-01 更新2025-04-09 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.44j0zpcf2
下载链接
链接失效反馈官方服务:
资源简介:
A potential shortcoming of concatenation methods for species tree
estimation is their failure to account for incomplete lineage
sorting. Coalescent methods address this problem but make various
assumptions that, if violated, can result in worse performance than
concatenation. Given the challenges of analyzing DNA sequences
with both concatenation and coalescent methods, retroelement insertions
(RIs) have emerged as powerful phylogenomic markers for species tree
estimation. Here, we show that two recently proposed
quartet-based methods, SDPquartets and ASTRAL_BP, are
statistically consistent estimators of the unrooted species tree topology
under the coalescent when RIs follow a neutral infinite-sites model of
mutation and the expected number of new RIs per generation is constant
across the species tree. The accuracy of these (and other)
methods for inferring species trees from RIs has yet to be assessed on
simulated data sets, where the true species tree topology is
known. Therefore, we evaluated eight methods given RIs simulated
from four model species trees, all of which have short branches and at
least three of which are in the anomaly zone. In our simulation
study, ASTRAL_BP and SDPquartets always recovered the
correct species tree topology when given a sufficiently large
number of RIs, as predicted. A distance-based method (ASTRID_BP)
and Dollo parsimony also performed well in recovering the species tree
topology. In contrast, unordered, polymorphism, and Camin-Sokal
parsimony (as well as an approach based on MDC) typically fail to recover
the correct species tree topology in anomaly zone situations with more
than four ingroup taxa. Of the methods studied, only ASTRAL_BP
automatically estimates internal branch lengths (in coalescent units) and
support values (i.e., local posterior probabilities). We examined
the accuracy of branch length estimation, finding that estimated lengths
were accurate for short branches but upwardly biased
otherwise. This led us to derive the maximum likelihood (branch
length) estimate for when RIs are given as input instead of binary gene
trees; this corrected formula produced accurate estimates of branch
lengths in our simulation study, provided that a sufficiently large number
of RIs were given as input. Lastly, we evaluated the impact of
data quantity on species tree estimation by repeating the above
experiments with input sizes varying from 100 to 100,000
parsimony-informative RIs. We found that, when given just 1,000
parsimony-informative RIs as input, ASTRAL_BP successfully reconstructed
major clades (i.e clades separated by branches >0.3 CUs)
with high support and identified rapid radiations (i.e., shorter connected
branches), although not their precise branching order. The local
posterior probability was effective for controlling false positive
branches in these scenarios.
提供机构:
Dryad
创建时间:
2021-11-25



