Data from: A Bayesian supertree model for genome-wide species tree reconstruction
收藏Mendeley Data2024-06-25 更新2024-06-27 收录
下载链接:
https://zenodo.org/records/4955919
下载链接
链接失效反馈官方服务:
资源简介:
Current phylogenomic data sets highlight the need for species tree methods able to deal with several sources of gene tree/species tree incongruence. At the same time, we need to make most use of all available data. Most species tree methods deal with single processes of phylogenetic discordance, namely, gene duplication and loss, incomplete lineage sorting or horizontal gene transfer. In this manuscript we address for the first time the problem of species tree inference from multilocus, genome-wide data sets in the presence of gene duplication and loss and incomplete lineage sorting, therefore without the need to identify orthologs or to use a single individual per species. We do this by extending the idea of Maximum Likelihood supertrees to a hierarchical Bayesian model where several sources of gene tree/species tree disagreement can be accounted for in a modular manner. We implemented this model in a computer program called guenomu whose input are with posterior distributions of unrooted gene tree topologies for multiple gene families, and whose output is the posterior distribution of rooted species tree topologies. We conducted extensive simulations under complex phylogenomic models in order to evaluate the performance ouf our approach in comparison with other species tree approaches able to deal with multilabeled trees. Our method ranked best, under both simulated and empirical data sets, in spite of ignoring branch lengths. Our results show in addition that under complex simulation scenarios , gene tree parsimony is also a competitive approach once we consider its speed in contrast to more sophisticated models.
当前的系统发育基因组数据集(phylogenomic data sets)凸显了对能够处理多种来源基因树/物种树不一致性(gene tree/species tree incongruence)的物种树推断方法的需求。与此同时,我们需要最大限度地利用所有可用数据。绝大多数现有物种树方法仅针对单一种系统发育冲突过程进行处理,即基因重复与丢失、不完全谱系分选(incomplete lineage sorting)或水平基因转移(horizontal gene transfer)。本研究首次针对同时存在基因重复与丢失、不完全谱系分选的多位点全基因组数据集开展物种树推断问题研究,因此无需鉴定直系同源基因,也无需每个物种仅使用单个个体。我们通过将最大似然超树(Maximum Likelihood supertrees)的思路拓展至分层贝叶斯模型,以模块化方式整合多种基因树/物种树不一致的来源,从而实现该目标。我们将该模型实现于一款名为guenomu的计算机程序中,其输入为多个基因家族的无根基因树拓扑结构后验分布,输出为有根物种树拓扑结构的后验分布。我们基于复杂系统发育基因组模型开展了大量模拟实验,以评估所提方法的性能,并与其他能够处理多标签树(multilabeled trees)的物种树方法进行对比。即便未考虑分支长度,我们的方法在模拟数据集与实证数据集上均表现最优。此外,研究结果表明,在复杂模拟场景下,若兼顾运算速度,基因树简约法(gene tree parsimony)相较于更复杂的模型同样具备竞争力。
创建时间:
2023-06-28



