five

Data from: Species delimitation using genome-wide SNP data

收藏
DataONE2014-03-07 更新2024-06-27 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈
官方服务:
资源简介:
The multispecies coalescent has provided important progress for evolutionary inferences, including increasing the statistical rigor and objectivity of comparisons among competing species delimitation models. However, Bayesian species delimitation methods typically require brute force integration over gene trees via Markov chain Monte Carlo (MCMC), which introduces a large computation burden and precludes their application to genomic-scale data. Here we combine a recently introduced dynamic programming algorithm for estimating species trees that bypasses MCMC integration over gene trees with sophisticated methods for estimating marginal likelihoods, needed for Bayesian model selection, to provide a rigorous and computationally tractable technique for genome-wide species delimitation. We provide a critical yet simple correction that brings the likelihoods of different species trees, and more importantly their corresponding marginal likelihoods, to the same common denominator, which enables direct and accurate comparisons of competing species delimitation models using Bayes factors. We test this approach, which we call Bayes factor delimitation (*with genomic data; BFD*), using common species delimitation scenarios with computer simulations. Varying the numbers of loci and the number of samples suggest that the approach can distinguish the true model even with few loci and limited samples per species. Misspecification of the prior for population size θ has little impact on support for the true model. We apply the approach to West African forest geckos (Hemidactylus fasciatus complex) using genome-wide SNP data. This new Bayesian method for species delimitation builds on a growing trend for objective species delimitation methods with explicit model assumptions that are easily tested.

多物种溯祖(multispecies coalescent)为进化推断领域带来了重要进展,包括提升竞争物种界定模型间比较的统计严谨性与客观性。然而,贝叶斯物种界定方法通常需要通过马尔可夫链蒙特卡洛(Markov Chain Monte Carlo, MCMC)对基因树开展蛮力积分,这会带来巨额计算负担,使其难以应用于基因组规模的数据。在此研究中,我们将新近提出的、无需通过MCMC对基因树进行积分即可估计物种树的动态规划算法,与贝叶斯模型选择所需的边际似然估计精密方法相结合,提出了一种严谨且计算可行的全基因组物种界定技术。我们提出了一项关键且简便的校正方法,可将不同物种树的似然值——更重要的是其对应的边际似然值——统一至相同的基准尺度之下,从而能够利用贝叶斯因子直接且准确地比较各类竞争物种界定模型。我们通过计算机模拟的常见物种界定场景对这一被命名为适配基因组数据的贝叶斯因子界定法(Bayes factor delimitation with genomic data, BFD)的方法进行了测试。调整位点数量与样本数量的实验结果表明,即便在位点较少、每个物种种群样本量有限的情况下,该方法仍可准确区分真实模型。种群大小θ的先验设定偏差对真实模型的支持度几乎无影响。我们利用全基因组单核苷酸多态性(single nucleotide polymorphism, SNP)数据,将该方法应用于西非森林壁虎(Hemidactylus fasciatus复合群)。这种新型贝叶斯物种界定方法,依托于当前日益增长的客观物种界定方法发展趋势——这类方法具备明确且易于检验的模型假设。
创建时间:
2014-03-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作