Data from: Using genomic location and coalescent simulation to investigate gene tree discordance in Medicago L.
收藏Mendeley Data2024-06-25 更新2024-06-27 收录
下载链接:
https://zenodo.org/records/5018052
下载链接
链接失效反馈官方服务:
资源简介:
Several well-documented evolutionary processes are known to cause conflict between species-level phylogenies and gene-level phylogenies. Three of the most challenging processes for species tree inference are incomplete lineage sorting, hybridization and gene duplication, which may result in unwarranted comparisons of paralogous genes. Several existing methods have dealt with these processes but none has yet been able to untangle all three at once. Here, we propose a stepwise method by which these processes can be discerned using information on genomic location coupled with coalescent simulations. In the first step, highly discordant genes within genomic blocks (putative paralogs) are identified and excluded from the data set and, in the second step, blocks of linked genes are grouped according to their hybrid history. Existing multispecies coalescent software can then be applied to recover the principal tree(s) that make up the species tree/network without violating the underlying model. The potential of the approach is evaluated on simulated data derived from a species network composed of nine species, of which one is of hybrid origin, and displaying a single-gene duplication that leads to paralogous comparisons. We apply our method to an empirical set of 12 genes from 7 species sampled in the plant genus Medicago that display phylogenetic discordance. We identify the causes of the discordance and demonstrate that the Medicago orbicularis lineage experienced an episode of ancient hybridization. Our results show promise as a new way to explore phylogenetic sequence data that can significantly improve species tree inference in presence of hybridization and undetected paralogy or other causes leading to extremely discordant gene trees.
已有充分研究的多种演化过程会导致物种水平系统发育树与基因水平系统发育树之间产生冲突。在物种树推断中最具挑战性的三类过程分别是不完全谱系分选(incomplete lineage sorting)、杂交(hybridization)与基因重复(gene duplication),这些过程可能会导致对旁系同源基因(paralogous genes)的不当比对。现有多种方法已针对这些过程展开研究,但目前尚无一种方法能够同时厘清这三类过程的影响。本研究提出一种分步方法,可结合基因组位置信息与溯祖模拟(coalescent simulations)来辨识上述三类演化过程。第一步,我们会识别并剔除基因组区域内高度不一致的基因(推定旁系同源基因);第二步,依据连锁基因块的杂交历史对其进行分组。随后可借助现有多物种溯祖(multispecies coalescent)软件,在不违背底层模型的前提下,重建构成物种树/物种网络的主要系统发育树。本研究基于由9个物种构成的物种网络生成模拟数据(其中一个物种为杂交起源,且存在会引发旁系同源基因比对的单基因重复事件),以此评估该方法的应用潜力。我们将该方法应用于一组实证数据:取自苜蓿属(Medicago)7个物种的12个基因,这些基因均表现出系统发育不一致性。我们明确了该不一致性的成因,并证实圆叶苜蓿(Medicago orbicularis)支系曾经历过一次古老杂交事件。本研究结果表明,该方法作为一种全新的系统发育序列数据分析手段颇具应用前景,可在存在杂交、未被检出的旁系同源基因比对或其他引发严重基因树不一致的因素时,显著提升物种树推断的准确性。
创建时间:
2023-06-28



