Data from: Computational performance and statistical accuracy of *BEAST and comparisons with other methods
收藏DataCite Commons2025-06-01 更新2025-06-15 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.02tf9
下载链接
链接失效反馈官方服务:
资源简介:
Under the multispecies coalescent model of molecular evolution, gene trees
have independent evolutionary histories within a shared species tree. In
comparison, supermatrix concatenation methods assume that gene trees share
a single common genealogical history, thereby equating gene coalescence
with species divergence. The multispecies coalescent is supported by
previous studies which found that its predicted distributions fit
empirical data, and that concatenation is not a consistent estimator of
the species tree. *BEAST, a fully Bayesian implementation of the
multispecies coalescent, is popular but computationally intensive, so the
increasing size of phylogenetic data sets is both a computational
challenge and an opportunity for better systematics. Using simulation
studies, we characterize the scaling behavior of *BEAST, and enable
quantitative prediction of the impact increasing the number of loci has on
both computational performance and statistical accuracy. Follow-up
simulations over a wide range of parameters show that the statistical
performance of *BEAST relative to concatenation improves both as branch
length is reduced and as the number of loci is increased. Finally, using
simulations based on estimated parameters from two phylogenomic data sets,
we compare the performance of a range of species tree and concatenation
methods to show that using *BEAST with tens of loci can be preferable to
using concatenation with thousands of loci. Our results provide insight
into the practicalities of Bayesian species tree estimation, the number of
loci required to obtain a given level of accuracy and the situations in
which supermatrix or summary methods will be outperformed by the fully
Bayesian multispecies coalescent.
在分子演化的多物种溯祖(multispecies coalescent)模型框架下,基因树在共享的物种树内拥有各自独立的演化历史。与之相对,超矩阵串联(supermatrix concatenation)方法假定所有基因树共享同一套共同的谱系历史,进而将基因溯祖过程与物种分化事件等同起来。
已有研究证实多物种溯祖模型的合理性:其预测的分布与经验数据拟合良好,且串联方法并非物种树的一致估计量。
*BEAST作为多物种溯祖模型的全贝叶斯实现工具,虽被广泛使用但计算量极大;随着系统发育数据集规模持续扩大,这既带来了计算层面的挑战,也为更精准的系统分类学研究创造了机遇。
本研究通过模拟实验刻画了*BEAST的缩放特性,并实现了对增加基因座(locus,复数形为loci)数量对计算性能与统计精度影响的定量预测。
针对广泛参数范围的后续模拟结果显示,随着分支长度缩短以及基因座数量增加,*BEAST相较于串联方法的统计性能会逐步提升。
最后,本研究基于两组系统基因组学数据集的估计参数开展模拟,对比了多种物种树推断方法与串联方法的性能,结果表明:使用数十个基因座运行*BEAST,其效果优于使用数千个基因座的串联方法。
本研究结果为贝叶斯物种树推断的实操细节、达到特定精度所需的基因座数量,以及超矩阵或汇总方法被全贝叶斯多物种溯祖模型超越的适用场景,提供了清晰的理论参考与实践指导。
提供机构:
Dryad
创建时间:
2015-12-09



