five

Data from: Detecting hybridization by likelihood calculation of gene tree extra lineages given explicit models

收藏
DataONE2017-06-29 更新2024-06-26 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈
官方服务:
资源简介:
Explanations for gene tree discordance with respect to a species tree are commonly attributed to deep coalescence (also known as incomplete lineage sorting [ILS]), as well as different evolutionary processes such as hybridization, horizontal gene transfer and gene duplication. Among these, deep coalescence is usually quantified as the number of extra lineages and has been studied as the principal source of discordance among gene trees, while the other processes that could contribute to gene tree discordance have not been fully explored. This is an important issue for hybridization because interspecific gene flow is well documented and widespread across many plant and animal groups. Here, we propose a new way to detect gene flow when ILS is present that evaluates the likelihood of different models with various levels of gene flow, by comparing the expected gene tree discordance, using the number of extra lineages. This approach consists of proposing a model, simulating a set of gene trees to infer a distribution of expected extra lineages given the model, and calculating a likelihood function by comparing the fit of the real gene trees to the simulated distribution. To count extra lineages, the gene tree is first reconciled within the species tree, and for a given species tree branch the number of gene lineages minus one is counted. We develop a set of R functions to parallelize software to allow simulations, and to compare hypotheses via a likelihood ratio test to evaluate the presence of gene flow when ILS is present, in a fast and simple way. Our results show high accuracy under very challenging scenarios of high impact of ILS and low gene flow levels, even using a modest dataset of five to ten loci and five to ten individuals per species. We present a powerful and fast method to detect hybridization in presence of ILS. We discuss its advantage with large dataset (such as genomic scale), and also identifies possible issues that should be explored with more complex models in future studies.

针对物种树(species tree)的基因树(gene tree)冲突现象,现有主流解释通常包括深层趋同演化(deep coalescence,亦称不完全谱系分选[incomplete lineage sorting, ILS]),以及杂交、水平基因转移(horizontal gene transfer)、基因复制(gene duplication)等其他进化过程。在上述成因中,深层趋同演化通常以额外谱系数(extra lineages)进行量化,且被视为引发基因树冲突的主要来源;而其他可能导致基因树冲突的进化过程尚未得到充分探究。这一问题在杂交研究中尤为关键,因为种间基因流动已有大量文献佐证,且在众多动植物类群中广泛分布。 在此,我们提出一种全新方法,可在存在不完全谱系分选(ILS)的场景下检测基因流动:该方法以额外谱系数为指标,对比预期的基因树冲突程度,进而评估不同基因流动水平下的模型似然性。该方法的实施流程包括:构建进化模型、模拟一组基因树以推导该模型下预期额外谱系数的分布,以及通过对比真实基因树与模拟分布的拟合程度来计算似然函数。在统计额外谱系数时,首先需将基因树与物种树进行基因树调和(gene tree reconciliation),针对某一特定物种树分支,统计基因谱系数减一的结果,即为该分支的额外谱系数。我们开发了一套R语言函数,用于对模拟软件进行并行化处理,并通过似然比检验(likelihood ratio test)对比不同假说,从而以快速简便的方式评估不完全谱系分选(ILS)场景下基因流动的存在与否。 我们的研究结果表明,即便在不完全谱系分选(ILS)影响强烈、基因流动水平较低的高难度场景下,仅使用包含5至10个基因座(locus)、每个物种5至10个个体的中小型数据集,该方法仍能保持较高的准确率。 本研究提出了一种高效快速的方法,可在存在不完全谱系分选(ILS)的场景下检测杂交事件。我们探讨了该方法在大型数据集(如基因组规模(genomic scale)数据集)中的应用优势,同时也指出了未来研究中需借助更复杂模型进一步探究的潜在问题。
创建时间:
2017-06-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作