Data from: The influence of gene flow on species tree estimation: a simulation study
收藏Mendeley Data2024-06-25 更新2024-06-27 收录
下载链接:
https://zenodo.org/records/4940796
下载链接
链接失效反馈官方服务:
资源简介:
Gene flow among populations or species and incomplete lineage sorting (ILS) are two evolutionary processes responsible for generating gene tree discordance and therefore hindering species tree estimation. Numerous studies have evaluated the impacts of ILS on species tree inference, yet the ramifications of gene flow on species trees remain less studied. Here, we simulate and analyze multilocus sequence data generated with ILS and gene flow to quantify their impacts on species tree inference. We characterize species tree estimation errors under various models of gene flow, such as the isolation-migration model, the n-island model, and gene flow between non-sister species or involving ancestral species, and species boundaries crossed by a single gene copy (allelic introgression) or by a single migrant individual. These patterns of gene flow are explored on species trees of different sizes (4 vs. 10 species), at different time scales (shallow vs. deep), and with different migration rates. Species trees are estimated with the multispecies coalescent model using Bayesian methods (BEST and *BEAST) and with a summary statistic approach (MPEST) that facilitates phylogenomic-scale analysis. Even in cases where the topology of the species tree is estimated with high accuracy, we find that gene flow can result in overestimates of population sizes (species tree dilation) and underestimates of species divergence times (species tree compression). Signatures of migration events remain present in the distribution of coalescent times for gene trees, and with sufficient data it is possible to identify those loci that have crossed species boundaries. These results highlight the need for careful sampling design in phylogeographic and species delimitation studies as gene flow, introgression, or incorrect sample assignments can bias the estimation of the species tree topology and of parameter estimates such as population sizes and divergence times.
种群或物种间的基因流(gene flow)与不完全谱系分选(incomplete lineage sorting, ILS)是两类引发基因树冲突、进而阻碍物种树推断的进化过程。已有诸多研究对不完全谱系分选在物种树推断中的影响展开了评估,但基因流对物种树的相关效应仍较少被探讨。本研究通过模拟并分析兼具不完全谱系分选与基因流特征的多位点序列数据,以量化二者对物种树推断的影响。我们针对多种基因流模型下的物种树推断误差进行了刻画,这些模型包括隔离-迁移模型、n岛模型、非姊妹物种间或涉及祖先物种的基因流场景,以及单个基因拷贝(等位基因渐渗,allelic introgression)或单个迁移个体跨越物种边界的基因流事件。我们在不同规模(4个物种vs.10个物种)、不同时间尺度(浅时间尺度vs.深时间尺度)以及不同迁移速率的物种树上,对上述各类基因流模式展开了探究。物种树推断采用基于多物种溯祖模型的贝叶斯方法(BEST与*BEAST),以及可支持系统基因组规模分析的摘要统计量方法(MPEST)。即便在物种树拓扑结构推断精度较高的场景中,我们仍发现基因流会导致种群大小的高估(物种树扩张,species tree dilation)以及物种分化时间的低估(物种树压缩,species tree compression)。迁移事件的信号仍留存于基因树的溯祖时间分布中,当数据量足够时,便可识别出那些跨越物种边界的基因座。本研究结果凸显了系统地理学与物种界定研究中谨慎设计采样方案的必要性——因为基因流、渐渗或错误的样本赋值,均可能对物种树拓扑结构以及种群大小、分化时间等参数的推断引入偏倚。
创建时间:
2023-06-28



