five

Data from: Computational performance and statistical accuracy of *BEAST and comparisons with other methods

收藏
DataCite Commons2025-06-01 更新2025-06-15 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.02tf9
下载链接
链接失效反馈
官方服务:
资源简介:
Under the multispecies coalescent model of molecular evolution, gene trees have independent evolutionary histories within a shared species tree. In comparison, supermatrix concatenation methods assume that gene trees share a single common genealogical history, thereby equating gene coalescence with species divergence. The multispecies coalescent is supported by previous studies which found that its predicted distributions fit empirical data, and that concatenation is not a consistent estimator of the species tree. *BEAST, a fully Bayesian implementation of the multispecies coalescent, is popular but computationally intensive, so the increasing size of phylogenetic data sets is both a computational challenge and an opportunity for better systematics. Using simulation studies, we characterize the scaling behavior of *BEAST, and enable quantitative prediction of the impact increasing the number of loci has on both computational performance and statistical accuracy. Follow-up simulations over a wide range of parameters show that the statistical performance of *BEAST relative to concatenation improves both as branch length is reduced and as the number of loci is increased. Finally, using simulations based on estimated parameters from two phylogenomic data sets, we compare the performance of a range of species tree and concatenation methods to show that using *BEAST with tens of loci can be preferable to using concatenation with thousands of loci. Our results provide insight into the practicalities of Bayesian species tree estimation, the number of loci required to obtain a given level of accuracy and the situations in which supermatrix or summary methods will be outperformed by the fully Bayesian multispecies coalescent.

在分子演化的多物种溯祖(multispecies coalescent)模型框架下,基因树在共享的物种树内拥有各自独立的演化历史。与之相对,超矩阵串联(supermatrix concatenation)方法假定所有基因树共享同一套共同的谱系历史,进而将基因溯祖过程与物种分化事件等同起来。 已有研究证实多物种溯祖模型的合理性:其预测的分布与经验数据拟合良好,且串联方法并非物种树的一致估计量。 *BEAST作为多物种溯祖模型的全贝叶斯实现工具,虽被广泛使用但计算量极大;随着系统发育数据集规模持续扩大,这既带来了计算层面的挑战,也为更精准的系统分类学研究创造了机遇。 本研究通过模拟实验刻画了*BEAST的缩放特性,并实现了对增加基因座(locus,复数形为loci)数量对计算性能与统计精度影响的定量预测。 针对广泛参数范围的后续模拟结果显示,随着分支长度缩短以及基因座数量增加,*BEAST相较于串联方法的统计性能会逐步提升。 最后,本研究基于两组系统基因组学数据集的估计参数开展模拟,对比了多种物种树推断方法与串联方法的性能,结果表明:使用数十个基因座运行*BEAST,其效果优于使用数千个基因座的串联方法。 本研究结果为贝叶斯物种树推断的实操细节、达到特定精度所需的基因座数量,以及超矩阵或汇总方法被全贝叶斯多物种溯祖模型超越的适用场景,提供了清晰的理论参考与实践指导。
提供机构:
Dryad
创建时间:
2015-12-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作