five

Data for article: Generation of accurate, expandable phylogenomic trees with uDANCE

收藏
DataCite Commons2025-05-12 更新2025-05-17 收录
下载链接:
https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/BCUM6P
下载链接
链接失效反馈
官方服务:
资源简介:
Phylogenetic trees provide a framework for organizing evolutionary histories across the tree of life and aid downstream comparative analyses such as metagenomic identification. Methods that rely on single marker genes such as 16S rRNA have produced trees of limited accuracy with hundreds of thousands of organisms, whereas methods that use genome-wide data are not scalable to large numbers of genomes. We introduce uDance, a method that enables updatable genome-wide inference using a divide-and-conquer strategy that refines different parts of the tree independently and can build off of existing trees, with high accuracy and scalability. With uDance, we infer a species tree of roughly 200,000 genomes using 387 marker genes, totaling 42.5 billion amino acid residues.

系统发育树(phylogenetic tree)可为全生命之树的演化历史构建提供分析框架,并助力宏基因组鉴定等下游比较分析工作。依赖16S核糖体RNA(16S rRNA)等单标记基因的方法,虽可构建包含数十万种生物的系统发育树,但精度有限;而采用全基因组数据的分析方法则无法扩展至大规模基因组数据集的处理场景。本研究提出uDance方法,该方法依托分治策略实现可更新的全基因组推断:可独立优化系统发育树的不同分支,且可基于现有系统发育树进行拓展构建,兼具高精度与可扩展性。借助uDance方法,本研究利用387个标记基因、总计425亿个氨基酸残基,构建了包含约20万个基因组的物种树(species tree)。
提供机构:
Harvard Dataverse
创建时间:
2023-06-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作