Data from: Phylogenetic tree estimation with and without alignment: new distance methods and benchmarking
收藏DataCite Commons2025-06-01 更新2025-06-15 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.n5r49
下载链接
链接失效反馈官方服务:
资源简介:
Phylogenetic tree inference is a critical component of many systematic and
evolutionary studies. The majority of these studies are based on the
two-step process of multiple sequence alignment followed by tree
inference, despite persistent evidence that the alignment step can lead to
biased results. Here we present a two-part study that first presents
PaHMM-Tree, a novel neighbour joining-based method that estimates pairwise
distances without assuming a single alignment. We then use simulations to
benchmark its performance against a wide-range of other phylogenetic tree
inference methods, including the first comparison of alignment-free
distance-based methods against more conventional tree estimation methods.
Our new method for calculating pairwise distances based on statistical
alignment provides distance estimates that are as accurate as those
obtained using standard methods based on the true alignment. Pairwise
distance estimates based on the two-step process tend to be substantially
less accurate. This improved performance carries through to tree
inference, where PaHMM-Tree provides more accurate tree estimates than all
of the pairwise distance methods assessed. For close to moderately
divergent sequence data we find that the two-step methods using
statistical inference, where information from all sequences is included in
the estimation procedure, tend to perform better than PaHMM-Tree,
particularly full statistical alignment, which simultaneously estimates
both the tree and the alignment. For deep divergences we find the
alignment step becomes so prone to error that our distance-based
PaHMM-Tree outperforms all other methods of tree inference. Finally, we
find that the accuracy of alignment-free methods tends to decline faster
than standard two-step methods in the presence of alignment uncertainty,
and identify no conditions where alignment-free methods are equal to or
more accurate than standard phylogenetic methods even in the presence of
substantial alignment error.
系统发育树推断是众多系统学与进化研究的核心环节。尽管已有持续证据表明比对步骤可能导致结果偏差,但大多数研究仍基于“多序列比对后进行树推断”这一两步流程。本文呈现一项两部分研究:首先提出PaHMM-Tree——一种基于邻接(neighbour joining)的新方法,无需假设单一比对即可估计成对距离;随后通过模拟对其性能进行基准测试,与多种其他系统发育树推断方法对比,包括首次将无比对的基于距离的方法与更传统的树估计方法进行比较。我们基于统计比对(statistical alignment)计算成对距离的新方法,所提供的距离估计精度与基于真实比对的标准方法相当;而基于两步流程的成对距离估计精度则显著较低。这种性能提升延续至树推断阶段:PaHMM-Tree在所有评估的基于距离的方法中提供了更准确的树估计结果。对于亲缘关系较近至中等分歧的序列数据,我们发现纳入所有序列信息的统计推断两步法(尤其是同时估计树与比对的全统计比对方法)往往优于PaHMM-Tree;而对于深度分歧数据,比对步骤的误差风险极高,此时基于距离的PaHMM-Tree方法性能优于所有其他树推断方法。最后,我们发现当存在比对不确定性时,无比对方法的精度下降速度快于标准两步法;即使在存在显著比对误差的情况下,也未发现无比对方法的精度等于或高于传统系统发育方法的情形。
提供机构:
Dryad
创建时间:
2016-08-24



