Data from: Robustness to divergence time underestimation when inferring species trees from estimated gene trees
收藏DataCite Commons2025-06-01 更新2025-06-15 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.65tn4
下载链接
链接失效反馈官方服务:
资源简介:
To infer species trees from gene trees estimated from phylogenomic data
sets, tractable methods are needed that can handle dozens to hundreds of
loci. We examine several computationally efficient approaches—MP-EST,
STAR, STEAC, STELLS, and STEM—for inferring species trees from gene trees
estimated using maximum likelihood (ML) and Bayesian approaches. Among the
methods examined, we found that topology-based methods often performed
better using ML gene trees and methods employing coalescent times
typically performed better using Bayesian gene trees, with MP-EST, STAR,
STEAC, and STELLS outperforming STEM under most conditions. We examine why
the STEM tree (also called GLASS or Maximum Tree) is less accurate on
estimated gene trees by comparing estimated and true coalescence times,
performing species tree inference using simulations, and analyzing a great
ape data set keeping track of false positive and false negative rates for
inferred clades. We find that although true coalescence times are more
ancient than speciation times under the multispecies coalescent model,
estimated coalescence times are often more recent than speciation times.
This underestimation can lead to increased bias and lack of resolution
with increased sampling (either alleles or loci) when gene trees are
estimated with ML. The problem appears to be less severe using Bayesian
gene-tree estimates.
为了从系统基因组数据集估计得到的基因树中推断物种树,需要可处理数十至数百个基因座的易处理方法。我们考察了多种计算高效的方法——MP-EST、STAR、STEAC、STELLS和STEM——用于从通过最大似然法(maximum likelihood, ML)和贝叶斯方法(Bayesian approach)估计得到的基因树中推断物种树。在所考察的方法中,我们发现基于拓扑结构的方法使用ML基因树时表现通常更优,而采用coalescent时间(coalescent times)的方法使用贝叶斯基因树时表现一般更佳;在大多数条件下,MP-EST、STAR、STEAC和STELLS的性能优于STEM。我们通过比较估计的和真实的coalescent时间、利用模拟进行物种树推断,以及分析一个记录推断分支假阳性和假阴性率的类人猿数据集,探究了STEM树(又称GLASS树或Maximum Tree)在估计基因树上准确性较低的原因。我们发现,尽管在多物种coalescent模型(multispecies coalescent model)下,真实的coalescent时间比物种形成时间更古老,但估计的coalescent时间往往比物种形成时间更近。当使用ML估计基因树时,这种低估会随着采样量的增加(无论是等位基因还是基因座)导致偏差增大和分辨率不足。使用贝叶斯基因树估计时,该问题似乎不那么严重。
提供机构:
Dryad
创建时间:
2013-08-21



