five

DEPP: Deep learning enables extending species trees using single genes

收藏
DataCite Commons2026-03-04 更新2025-06-15 收录
下载链接:
https://datadryad.org/dataset/doi:10.6076/D1JS3Z
下载链接
链接失效反馈
官方服务:
资源简介:
Placing new sequences onto reference phylogenies is increasingly used for analyzing environmental samples, especially microbiomes. However, existing placement methods have a fundamental limitation: they assume that query sequences have evolved using specific models directly on the reference phylogeny. Thus, they can place single-gene data (e.g., 16S rRNA amplicons) onto their own gene tree. This practice is a proxy for a more ambitious goal: extending a (genome-wide) species tree given data from individual genes. No algorithm currently addresses this challenging problem. Here, we introduce Deep-learning Enabled Phylogenetic Placement (DEPP), an algorithm that learns to extend species trees using single genes without pre-specified models. We show that DEPP updates the multi-locus microbial tree-of-life with single genes with high accuracy. We further demonstrate that DEPP can achieve the long-standing goal of combining 16S and metagenomic data onto a single tree, enabling community structure analyses that were previously impossible and producing robust patterns.

将新序列映射到参考系统发育树(reference phylogenies)上的方法正日益广泛应用于环境样本分析,尤其是微生物组(microbiomes)研究。然而,现有系统发育放置方法(placement methods)存在一个根本性局限:它们假设查询序列(query sequences)是通过特定模型直接在参考系统发育树上进化而来的。因此,这些方法可将单基因数据(如16S rRNA扩增子)映射到其对应的基因树(gene tree)上。这种做法是实现更高远目标的权宜之计:利用单个基因的数据扩展(全基因组)物种树(species tree)。目前尚无算法能解决这一具有挑战性的问题。在此,我们提出深度学习赋能的系统发育放置算法(Deep-learning Enabled Phylogenetic Placement, DEPP),这是一种无需预设模型即可通过单基因数据扩展物种树的算法。我们的研究表明,DEPP能以高精度利用单基因数据更新多基因座微生物生命树(tree-of-life)。我们进一步证明,DEPP可实现将16S数据与宏基因组数据整合到单一系统发育树的长期目标,从而支持此前无法开展的群落结构分析,并产生稳健的模式。
提供机构:
Dryad
创建时间:
2021-06-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作