Data from: SuperFine: fast and accurate supertree estimation
收藏DataONE2011-05-16 更新2024-06-27 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈官方服务:
资源简介:
Many research groups are estimating trees containing anywhere from a few thousand to hundreds of thousands of species, towards the eventual goal of the estimation of a Tree of Life, containing perhaps as many as several million leaves. These phylogenetic estimations present enormous computational challenges, and current computational methods are likely to fail to run even on datasets in the low end of this range. One approach to estimate a large species tree is to use phylogenetic estimation methods (such as maximum likelihood) on a supermatrix produced by concatenating multiple sequence alignments for a collection of markers; however, the most accurate of these phylogenetic estimation methods are extremely computationally intensive for datasets with more than a few thousand sequences. Supertree methods, which assemble phylogenetic trees from a collection of trees on subsets of the taxa, are important tools for phylogeny estimation where phylogenetic analyses based upon maximum likelihood are infeasible. In this paper, we introduce SuperFine, a meta-method that utilizes a novel two-step procedure in order to improve the accuracy and scalability of supertree methods. Our study, using both simulated and empirical data, shows that SuperFine-boosted supertree methods produce more accurate trees than standard supertree methods, and run quickly on very datasets with thousands of sequences. Furthermore, SuperFine-boosted MRP (Matrix Representation with Parsimony, the most well known supertree method) approaches the accuracy of maximum likelihood methods on supermatrix datasets under realistic conditions.
诸多研究团队正针对包含数千至数十万物种的系统发育树开展估算工作,以期最终实现生命之树(Tree of Life)的构建,该树的终端分类单元(即传统所称的“叶片”)数量或可达数百万之巨。此类系统发育估算任务面临极为严峻的计算挑战,即便针对该范围下限的数据集,现有计算方法也大概率无法顺利运行。估算大型物种树的一种常用方案,是将一系列标记基因的多序列比对结果拼接为超矩阵(supermatrix),再基于系统发育估算方法(如最大似然法(maximum likelihood))开展分析;然而,对于序列数超过数千的数据集,这类方法中精度最高的那些仍会带来极高的计算负载。超树方法(supertree method)通过对分类单元子集上的一系列系统发育树进行拼接以构建整体树结构,是当基于最大似然法的系统发育分析难以实现时,用于系统发育估算的重要工具。本文提出了SuperFine这一元方法,其采用全新的两步流程,旨在提升超树方法的精度与可扩展性。本研究通过模拟数据与实测数据开展验证,结果显示:经SuperFine优化后的超树方法,相比标准超树方法可构建出精度更高的系统发育树,且针对包含数千条序列的大型数据集也能快速运行。此外,在现实条件下,经SuperFine优化后的MRP(矩阵简约法表示(Matrix Representation with Parsimony),目前最主流的超树方法),在超矩阵数据集上的分析精度可逼近最大似然法的水平。
创建时间:
2011-05-16



