Data from: Effective online Bayesian phylogenetics via sequential Monte Carlo with guided proposals
收藏DataCite Commons2025-04-01 更新2025-04-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.n7n85
下载链接
链接失效反馈官方服务:
资源简介:
Modern infectious disease outbreak surveillance produces continuous
streams of sequence data which require phylogenetic analysis as data
arrives. Current software packages for Bayesian phylogenetic inference are
unable to quickly incorporate new sequences as they become available,
making them less useful for dynamically unfolding evolutionary stories.
This limitation can be addressed by applying a class of Bayesian
statistical inference algorithms called sequential Monte Carlo (SMC) to
conduct online inference, wherein new data can be continuously
incorporated to update the estimate of the posterior probability
distribution. In this paper we describe and evaluate several different
online phylogenetic sequential Monte Carlo (OPSMC) algorithms. We show
that proposing new phylogenies with a density similar to the Bayesian
prior suffers from poor performance, and we develop guided proposals that
better match the proposal density to the posterior. Furthermore, we show
that the simplest guided proposals can exhibit pathological behavior in
some situations, leading to poor results, and that the situation can be
resolved by heating the proposal density. The results demonstrate that
relative to the widely-used MCMC-based algorithm implemented in MrBayes,
the total time required to compute a series of phylogenetic posteriors as
sequences arrive can be significantly reduced by the use of OPSMC, without
incurring a significant loss in accuracy.
现代传染病暴发监测会产生连续的序列数据流,这些数据在生成时即需进行系统发育分析(phylogenetic analysis)。当前用于贝叶斯系统发育推断(Bayesian phylogenetic inference)的软件包无法在新序列生成时快速将其纳入分析,这使其在动态演化过程的研究中实用性受限。这一局限可通过应用一类称为序贯蒙特卡罗(sequential Monte Carlo, SMC)的贝叶斯统计推断算法进行在线推断来解决,该方法能持续纳入新数据以更新后验概率分布的估计值。本文描述并评估了多种不同的在线系统发育序贯蒙特卡罗(online phylogenetic sequential Monte Carlo, OPSMC)算法。我们发现,若生成新系统发育树的概率密度与贝叶斯先验(Bayesian prior)相近,则会导致性能不佳;为此我们开发了引导提议方法,可使提议密度与后验分布更匹配。此外,我们还发现,最简单的引导提议在某些情况下会表现出异常行为(pathological behavior),从而导致结果不佳;而这一问题可通过加热提议密度来解决。结果表明,与MrBayes中实现的广泛使用的基于马尔可夫链蒙特卡罗(Markov Chain Monte Carlo, MCMC)的算法相比,使用OPSMC计算序列生成时的一系列系统发育后验分布所需的总时间可显著减少,且不会造成准确性的显著损失。
提供机构:
Dryad
创建时间:
2017-11-21



