DEPP: Deep learning enables extending species trees using single genes
收藏Mendeley Data2024-04-13 更新2024-06-27 收录
下载链接:
https://datadryad.org/stash/dataset/doi:10.6076/D14G68
下载链接
链接失效反馈官方服务:
资源简介:
Placing new sequences onto reference phylogenies is increasingly used for analyzing environmental samples, especially microbiomes. However, existing placement methods have a fundamental limitation: they assume that query sequences have evolved using specific models directly on the reference phylogeny. Thus, they can place single-gene data (e.g., 16S rRNA amplicons) onto their own gene tree. This practice is a proxy for a more ambitious goal: extending a (genome-wide) species tree given data from individual genes. No algorithm currently addresses this challenging problem. Here, we introduce Deep-learning Enabled Phylogenetic Placement (DEPP), an algorithm that learns to extend species trees using single genes without pre-specified models. We show that DEPP updates the multi-locus microbial tree-of-life with single genes with high accuracy. We further demonstrate that DEPP can achieve the long-standing goal of combining 16S and metagenomic data onto a single tree, enabling community structure analyses that were previously impossible and producing robust patterns.
将新序列置于参考系统发育树上的系统发育放置方法,如今已愈发广泛地应用于环境样本分析,尤其是微生物组研究。然而,现有的序列放置方法存在根本性局限:它们假定查询序列是直接基于参考系统发育树,按照特定进化模型演化而来。因此,此类方法仅能将单基因数据(如16S rRNA扩增子)放置至对应的基因树上。此类操作实则是为了实现一项更具挑战性的目标:即利用单基因数据拓展全基因组物种树,但目前尚无算法能够解决这一难题。为此,我们提出深度学习辅助系统发育放置算法(Deep-learning Enabled Phylogenetic Placement,DEPP),该算法可无需预设进化模型,仅通过单基因数据完成物种树的拓展。实验结果表明,DEPP能够以极高的准确度利用单基因更新多位点微生物生命之树。进一步研究显示,DEPP能够实现将16S rRNA扩增子与宏基因组数据整合至同一系统发育树这一长期以来的目标,从而开展此前无法实现的群落结构分析,并得到稳健可靠的分析结果。
创建时间:
2023-11-16



