Probabilistic Phylogenetic Inference with Insertions and Deletions
收藏NIAID Data Ecosystem2026-03-06 收录
下载链接:
https://figshare.com/articles/dataset/Probabilistic_Phylogenetic_Inference_with_Insertions_and_Deletions/149695
下载链接
链接失效反馈官方服务:
资源简介:
A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time. However, the most widely used phylogenetic models only account for residue substitution events. We describe a probabilistic model of a multiple sequence alignment that accounts for insertion and deletion events in addition to substitutions, given a phylogenetic tree, using a rate matrix augmented by the gap character. Starting from a continuous Markov process, we construct a non-reversible generative (birth–death) evolutionary model for insertions and deletions. The model assumes that insertion and deletion events occur one residue at a time. We apply this model to phylogenetic tree inference by extending the program dnaml in phylip. Using standard benchmarking methods on simulated data and a new “concordance test” benchmark on real ribosomal RNA alignments, we show that the extended program dnamlε improves accuracy relative to the usual approach of ignoring gaps, while retaining the computational efficiency of the Felsenstein peeling algorithm.
序列分析中的一项基础任务是:在给定关联各序列的系统发育树(phylogenetic tree)以及描述序列随时间演化的进化模型的前提下,计算多序列比对(multiple sequence alignment)的概率。然而,当前应用最为广泛的系统发育模型仅考虑了残基替换(residue substitution)事件。我们提出了一种多序列比对的概率模型,该模型在给定系统发育树的前提下,采用引入间隙字符(gap character)扩充后的速率矩阵(rate matrix),可同时纳入替换事件与插入和缺失事件。我们从连续马尔可夫过程(continuous Markov process)出发,构建了针对插入和缺失事件的非可逆生成式生灭演化模型(birth–death)。该模型假设插入和缺失事件均以单残基为单位发生。我们通过扩展PHYLIP中的dnaml程序,将该模型应用于系统发育树推断(phylogenetic tree inference)任务。通过在模拟数据(simulated data)上采用标准基准测试方法,以及在真实核糖体RNA(ribosomal RNA)比对数据上使用全新的"一致性测试(concordance test)"基准方案,我们证明了扩展后的程序dnamlε相较于忽略间隙的常规方法,在提升准确性的同时,仍保留了费尔森斯坦剥离算法(Felsenstein peeling algorithm)的计算效率。
创建时间:
2016-10-28



