Data from: Genealogical working distributions for Bayesian model testing with phylogenetic uncertainty
收藏DataONE2015-11-03 更新2024-06-27 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈官方服务:
资源简介:
Marginal likelihood estimates to compare models using Bayes factors frequently accompany Bayesian phylogenetic inference. Approaches to estimate marginal likelihoods have garnered increased attention over the past decade. In particular, the introduction of path sampling (PS) and stepping-stone sampling (SS) into Bayesian phylogenetics has tremendously improved the accuracy of model selection. These sampling techniques are now used to evaluate complex evolutionary and population genetic models on empirical data sets, but considerable computational demands hamper their widespread adoption. Further, when very diffuse, but proper priors are specified for model parameters, numerical issues complicate the exploration of the priors, a necessary step in marginal likelihood estimation using PS or SS. To avoid such instabilities, generalized SS (GSS) has recently been proposed, introducing the concept of “working distributions” to facilitate - or shorten - the integration process that underlies marginal likelihood estimation. However, the need to fix the tree topology currently limits GSS in a coalescent-based framework. Here, we extend GSS by relaxing the fixed underlying tree topology assumption. To this purpose, we introduce a “working” distribution on the space of genealogies, that enables estimating marginal likelihoods while accommodating phylogenetic uncertainty. We propose two different “working” distributions that help GSS to outperform PS and SS in terms of accuracy when comparing demographic and evolutionary models applied to synthetic data and real-world examples. Further, we show that the use of very diffuse priors can lead to a considerable overestimation in marginal likelihood when using PS and SS, while still retrieving the correct marginal likelihood using both GSS approaches. The methods used in this paper are available in BEAST, a powerful user-friendly software package to perform Bayesian evolutionary analyses.
利用贝叶斯因子(Bayes factors)对比模型的边际似然估计(marginal likelihood estimates),常伴随贝叶斯系统发育推断(Bayesian phylogenetic inference)一同产出。近十年来,边际似然估计方法受到了学界越来越多的关注。具体而言,将路径抽样(path sampling, PS)与阶梯抽样(stepping-stone sampling, SS)引入贝叶斯系统发育学领域,极大提升了模型选择的准确性。当前,这类抽样技术已被用于基于经验数据集的复杂演化与群体遗传模型评估,但高昂的计算成本掣肘了其广泛应用。此外,当为模型参数指定极为弥散但合法的先验分布时,数值问题会干扰先验空间的探索——而这是使用PS或SS开展边际似然估计的必要步骤。为规避这类不稳定性,研究者近期提出了广义阶梯抽样(generalized SS, GSS),引入“工作分布(working distributions)”的概念以简化或加速边际似然估计所依托的积分流程。但目前,固定树拓扑结构的要求限制了GSS在溯祖框架(coalescent-based framework)中的应用。本文中,我们通过放宽固定底层树拓扑的假设,对GSS进行了拓展。为此,我们在系谱空间(genealogies)引入了两种不同的工作分布,使得在纳入系统发育不确定性的同时仍可估计边际似然。针对合成数据与真实案例中的人口统计与演化模型对比任务,我们提出的两种工作分布均能让GSS在准确性上优于PS与SS。此外,我们还发现,当使用PS与SS时,极为弥散的先验分布会导致边际似然被显著高估,而两种GSS方法仍能得到正确的边际似然估计结果。本文所用方法已集成于BEAST——一款功能强大且易用的贝叶斯演化分析软件包。
创建时间:
2015-11-03



