Towards improving searches for optimal phylogenies
收藏NIAID Data Ecosystem2026-03-08 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.89qf8
下载链接
链接失效反馈官方服务:
资源简介:
Finding the optimal evolutionary history for a set of taxa is a challenging computational problem, even when restricting possible solutions to be “tree-like” and focusing on the maximum-parsimony optimality criterion. This has led to much work on using heuristic tree searches to find approximate solutions. We present an approach for finding exact optimal solutions that employs and complements the current heuristic methods for finding optimal trees. Given a set of taxa and a set of aligned sequences of characters, there may be subsets of characters that are compatible, and for each such subset there is an associated (possibly partially resolved) phylogeny with edges corresponding to each character state change. These perfect phylogenies serve as anchor trees for our constrained search space. We show that, for sequences with compatible sites, the parsimony score of any tree T is at least the parsimony score of the anchor trees plus the number of inferred changes between T and the anchor trees. As the maximum-parsimony optimality score is additive, the sum of the lower bounds on compatible character partitions provides a lower bound on the complete alignment of characters. This yields a region in the space of trees within which the best tree is guaranteed to be found; limiting the search for the optimal tree to this region can significantly reduce the number of trees that must be examined in a search of the space of trees. We analyze this method empirically using four different biological data sets as well as surveying 400 data sets from the TreeBASE repository, demonstrating the effectiveness of our technique in reducing the number of steps in exact heuristic searches for trees under the maximum-parsimony optimality criterion.
为一组分类群(taxa)寻找最优演化历史是一项极具挑战性的计算问题,即便将可行解限定为“树状结构”,且以最大简约性(maximum-parsimony)最优性准则为优化目标。这一现状推动了大量借助启发式树搜索来求解近似解的研究工作。我们提出了一种可求解精确最优解的方法,该方法能够兼容并补充现有最优树启发式求解方法。给定一组分类群与一套经比对的性状序列,其中可能存在若干相容性状子集;针对每个此类子集,均可构建出对应的(可能为部分解析的)系统发育树(phylogeny),其分支对应各性状状态的改变。此类完美系统发育树(perfect phylogenies)将作为约束搜索空间的锚定树。我们证明,对于带有相容位点的序列而言,任意树T的简约分值至少为锚定树的简约分值,加上T与锚定树之间的推断改变次数。由于最大简约性最优分值具备可加性,因此相容性状分区的下界之和可构成完整性状比对序列的下界。这一结论可在树空间中划定一个区域,最优树必然存在于该区域内;将最优树的搜索范围限定于此区域,可大幅缩减树空间搜索过程中需要遍历的树的数量。我们通过4套不同的生物数据集,以及从TreeBASE数据库中调取的400套数据集,对该方法开展了实证分析,结果证实了本方法在基于最大简约性最优性准则的精确启发式树搜索中,可有效缩减搜索步数。
创建时间:
2014-08-28



