Data from: Maximum likelihood implementation of an isolation-with-migration model for three species
收藏DataONE2016-07-12 更新2024-06-26 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈官方服务:
资源简介:
We develop a maximum likelihood (ML) method for estimating migration rates between species using genomic sequence data. A species tree is used to accommodate the phylogenetic relationships among three species, allowing for migration between the two sister species, while the third species is used as an out-group. A Markov chain characterization of the genealogical process of coalescence and migration is used to integrate out the migration histories at each locus analytically, whereas Gaussian quadrature is used to integrate over the coalescent times on each genealogical tree numerically. This is an extension of our early implementation of the symmetrical isolation-with-migration model for three species to accommodate arbitrary loci with two or three sequences per locus and to allow asymmetrical migration rates. Our implementation can accommodate tens of thousands of loci, making it feasible to analyze genome-scale data sets to test for gene flow. We calculate the posterior probabilities of gene trees at individual loci to identify genomic regions that are likely to have been transferred between species due to gene flow. We conduct a simulation study to examine the statistical properties of the likelihood ratio test for gene flow between the two in-group species and of the ML estimates of model parameters such as the migration rate. Inclusion of data from a third out-group species is found to increase dramatically the power of the test and the precision of parameter estimation. We compiled and analyzed several genomic data sets from the Drosophila fruit flies. Our analyses suggest no migration from D. melanogaster to D. simulans, and a significant amount of gene flow from D. simulans to D. melanogaster, at the rate of ~0.02 migrant individuals per generation. We discuss the utility of the multispecies coalescent model for species tree estimation, accounting for incomplete lineage sorting and migration.
我们开发了一种最大似然(maximum likelihood, ML)方法,用于基于基因组序列数据估算物种间的迁移率。本研究采用物种树(species tree)刻画三个物种间的系统发育关系,允许两个姐妹物种间发生基因迁移,同时将第三个物种作为外类群(out-group)。我们通过马尔可夫链(Markov chain)对溯祖与迁移的系谱过程进行建模,以解析方式对每个基因座(locus)上的迁移历史进行积分;针对每个系谱树上的溯祖时间,则采用高斯求积(Gaussian quadrature)进行数值积分。本方法是对我们早期开发的三物种对称隔离-迁移(isolation-with-migration, IM)模型实现的扩展,可适配每个基因座包含2条或3条序列的各类基因座,并支持非对称迁移率设置。我们的实现可处理数万个基因座,使得分析基因组规模数据集以检验基因流(gene flow)成为可能。我们通过计算单个基因座上的基因树后验概率,以识别因基因流而可能发生跨物种序列转移的基因组区域。我们开展了模拟研究,以检验两类对象的统计特性:一是两类内群物种间基因流的似然比检验(likelihood ratio test),二是迁移率等模型参数的最大似然估计结果。研究发现,纳入第三个外类群物种的数据可显著提升检验效力与参数估计的精度。我们收集并分析了多份来自果蝇属(Drosophila)的基因组数据集。分析结果显示,不存在从黑腹果蝇(Drosophila melanogaster, D. melanogaster)向拟暗果蝇(Drosophila simulans, D. simulans)的基因迁移,而反向存在显著基因流,速率约为每世代0.02个迁移个体。我们还探讨了多物种溯祖模型(multispecies coalescent model)在物种树估算中的应用价值,该模型可同时考虑不完全谱系分选(incomplete lineage sorting, ILS)与基因流现象。
创建时间:
2016-07-12



