Data from: Phylogenetic tree estimation with and without alignment: new distance methods and benchmarking
收藏DataONE2016-08-24 更新2024-06-26 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈官方服务:
资源简介:
Phylogenetic tree inference is a critical component of many systematic and evolutionary studies. The majority of these studies are based on the two-step process of multiple sequence alignment followed by tree inference, despite persistent evidence that the alignment step can lead to biased results. Here we present a two-part study that first presents PaHMM-Tree, a novel neighbor joining-based method that estimates pairwise distances without assuming a single alignment. We then use simulations to benchmark its performance against a wide-range of other phylogenetic tree inference methods, including the first comparison of alignment-free distance-based methods against more conventional tree estimation methods. Our new method for calculating pairwise distances based on statistical alignment provides distance estimates that are as accurate as those obtained using standard methods based on the true alignment. Pairwise distance estimates based on the two-step process tend to be substantially less accurate. This improved performance carries through to tree inference, where PaHMM-Tree provides more accurate tree estimates than all of the pairwise distance methods assessed. For close to moderately divergent sequence data we find that the two-step methods using statistical inference, where information from all sequences is included in the estimation procedure, tend to perform better than PaHMM-Tree, particularly full statistical alignment, which simultaneously estimates both the tree and the alignment. For deep divergences we find the alignment step becomes so prone to error that our distance-based PaHMM-Tree outperforms all other methods of tree inference. Finally, we find that the accuracy of alignment-free methods tends to decline faster than standard two-step methods in the presence of alignment uncertainty, and identify no conditions where alignment-free methods are equal to or more accurate than standard phylogenetic methods even in the presence of substantial alignment error.
系统发育树推断是众多系统学与进化生物学研究的核心环节。尽管已有持续的研究证据表明比对步骤可能引入结果偏倚,但此类研究大多仍采用先多序列比对、再进行树推断的两步流程。本研究分为两部分:首先提出PaHMM-Tree——一种无需预设单一比对的新型基于邻接法的成对距离估计算法;随后通过模拟实验,将该方法与一众其他系统发育树推断算法进行性能基准测试,其中首次完成了无比对距离法与传统树估算方法的对比分析。我们提出的基于统计比对的成对距离计算新方法,其距离估算精度可与基于真实比对的标准方法所得结果相媲美,而基于两步流程的成对距离估算精度则往往大幅降低。这一性能优势同样体现在树推断环节:在所评估的所有成对距离法中,PaHMM-Tree的树估算精度最高。针对近缘至中等分化的序列数据,采用统计推断的两步法(在估算过程中纳入所有序列的信息)往往比PaHMM-Tree表现更优,尤其是可同时估算树与比对的全统计比对法。而对于深度分化的序列数据,比对步骤极易出现误差,此时基于距离法的PaHMM-Tree则在所有树推断方法中表现最优。最后,研究发现:在存在比对不确定性的情况下,无比对方法的精度衰减速度快于标准两步法;且即便存在显著的比对误差,也未发现任何场景下无比对方法的精度可达到或超越标准系统发育方法。
创建时间:
2016-08-24



