Data from: Invariant versus classical quartet inference when evolution is heterogeneous across sites and lineages

Name: Data from: Invariant versus classical quartet inference when evolution is heterogeneous across sites and lineages
Creator: Dryad
Published: 2025-05-01 02:40:29
License: 暂无描述

DataCite Commons2025-05-01 更新2025-05-10 收录

下载链接：

https://datadryad.org/dataset/doi:10.5061/dryad.jk7t0

下载链接

链接失效反馈

官方服务：

资源简介：

One reason why classical phylogenetic reconstruction methods fail to correctly infer the underlying topology is because they assume oversimplified models. In this paper we propose a quartet reconstruction method consistent with the most general Markov model of nucleotide substitution, which can also deal with data coming from mixtures on the same topology. Our proposed method uses phylogenetic invariants and provides a system of weights that can be used as input for quartet-based methods. We study its performance on real data and on a wide range of simulated 4-taxon data (both time-homogeneous and nonhomogeneous, with or without among-site rate heterogeneity, and with different branch length settings). We compare it to the classical methods of neighbor-joining (with paralinear distance), maximum likelihood (with different underlying models), and maximum parsimony. Our results show that this method is accurate and robust, has a similar performance to ML when data satisfies the assumptions of both methods, and outperforms the other methods when these are based on inappropriate substitution models. If alignments are long enough, then it also outperforms other methods when some of its assumptions are violated.

经典系统发育重建方法无法正确推断底层拓扑结构的原因之一，在于它们采用了过于简化的模型。本文提出一种与最通用核苷酸替代马尔可夫模型（Markov model of nucleotide substitution）一致的四元组重建方法，该方法还可处理来自同一拓扑结构上混合模型的数据。我们提出的方法利用系统发育不变量（phylogenetic invariants），并提供一套权重体系，可作为四元组方法的输入。我们在真实数据和多种模拟四分类群数据（包括时间同质与异质、有无位点间速率异质性、以及不同分支长度设置）上研究了该方法的性能。我们将其与经典方法进行比较，包括邻接法（neighbor-joining，采用平行线性距离）、最大似然法（maximum likelihood，采用不同底层模型）以及最大简约法（maximum parsimony）。结果表明，该方法准确且稳健；当数据满足两种方法的假设时，其性能与最大似然法（ML）相当；而当其他方法基于不恰当的替代模型时，该方法表现更优。若序列比对长度足够，即使其部分假设被违背，该方法仍能优于其他方法。

提供机构：

Dryad

创建时间：

2015-11-18

5,000+

优质数据集

54 个

任务类型

进入经典数据集