DEPP: Deep learning enables extending species trees using single genes
收藏DataCite Commons2026-03-04 更新2025-06-15 收录
下载链接:
https://datadryad.org/dataset/doi:10.6076/D1JS3Z
下载链接
链接失效反馈官方服务:
资源简介:
Placing new sequences onto reference phylogenies is increasingly used for
analyzing environmental samples, especially microbiomes. However, existing
placement methods have a fundamental limitation: they assume that query
sequences have evolved using specific models directly on the reference
phylogeny. Thus, they can place single-gene data (e.g., 16S rRNA
amplicons) onto their own gene tree. This practice is a proxy for a more
ambitious goal: extending a (genome-wide) species tree given data from
individual genes. No algorithm currently addresses this challenging
problem. Here, we introduce Deep-learning Enabled Phylogenetic Placement
(DEPP), an algorithm that learns to extend species trees using single
genes without pre-specified models. We show that DEPP updates the
multi-locus microbial tree-of-life with single genes with high accuracy.
We further demonstrate that DEPP can achieve the long-standing goal of
combining 16S and metagenomic data onto a single tree, enabling community
structure analyses that were previously impossible and producing robust
patterns.
将新序列映射到参考系统发育树(reference phylogenies)上的方法正日益广泛应用于环境样本分析,尤其是微生物组(microbiomes)研究。然而,现有系统发育放置方法(placement methods)存在一个根本性局限:它们假设查询序列(query sequences)是通过特定模型直接在参考系统发育树上进化而来的。因此,这些方法可将单基因数据(如16S rRNA扩增子)映射到其对应的基因树(gene tree)上。这种做法是实现更高远目标的权宜之计:利用单个基因的数据扩展(全基因组)物种树(species tree)。目前尚无算法能解决这一具有挑战性的问题。在此,我们提出深度学习赋能的系统发育放置算法(Deep-learning Enabled Phylogenetic Placement, DEPP),这是一种无需预设模型即可通过单基因数据扩展物种树的算法。我们的研究表明,DEPP能以高精度利用单基因数据更新多基因座微生物生命树(tree-of-life)。我们进一步证明,DEPP可实现将16S数据与宏基因组数据整合到单一系统发育树的长期目标,从而支持此前无法开展的群落结构分析,并产生稳健的模式。
提供机构:
Dryad
创建时间:
2021-06-04



