Data from: Gaps, an elusive source of phylogenetic information

DataONE2012-03-23 更新2024-06-27 收录

下载链接：

https://search.dataone.org/view/null

下载链接

链接失效反馈

官方服务：

资源简介：

Morrison (2009) raises a very fundamental question, “Why would phylogeneticists ignore computerized sequence alignment?” While well aware of the difficulties, he considers the whole issue is a ‘gaping hole that needs to be filled’. Particularly with the expansion of genomic-scale data there are many advantages to using automated alignment for phylogenetic analyses, the most obvious being that it is much more efficient and potentially less prone to experimenter bias. So yes, it is obviously desirable to automate data preparation as far as possible, but the question remains whether we are yet at the stage that automated sequence alignment can obtain the full and correct phylogenetic information in the data. In this paper we use an example shorebird dataset to explore three related questions regarding the interplay between alignment and phylogeny estimation: 1) are gap-rich alignments reliable for phylogenetic inference? 2) How much phylogenetic information is contained in gaps as compared to sequences? 3) Are models of the insertion/deletion process essential, and if so at what phylogenetic depths? We report that there is considerable information created by the indel (insertion/deletion) process that is potentially available for phylogenetic inference. Ideally, we should be able to independently obtain the same tree from both sequences and from gaps; however there is still considerable variability in the alignments produced by different programs. We predict that better and more computationally tractable models of the indel process will be required before the information in gaps can be fully exploited for phylogenetic inference.

莫里森（Morrison, 2009）提出了一个极具根本性的问题：“为何系统发育学家会忽视计算机化序列比对（computerized sequence alignment）？”尽管其充分洞悉相关研究难点，但他仍认为该议题整体上属于一个“亟待填补的重大空白”。尤其随着基因组规模数据（genomic-scale data）的持续扩张，将自动化序列比对（automated alignment）应用于系统发育分析具备诸多优势，其中最显著的一点便是效率大幅提升，且实验者偏倚的潜在发生概率显著降低。因此，尽可能实现数据制备的自动化无疑是合理的目标，但核心疑问始终存在：当前的自动化序列比对是否已能够从数据中提取完整且正确的系统发育信息？在本文中，我们以滨鸟数据集为例，探讨了与序列比对与系统发育估计（phylogeny estimation）之间相互作用相关的三个核心问题：1）富含缺失位点的比对是否可可靠用于系统发育推断（phylogenetic inference）？2）相较于序列本身，缺失位点所蕴含的系统发育信息占比如何？3）插入缺失（insertion/deletion, indel）过程的模型是否不可或缺，若不可或缺，其适用的系统发育深度范围为何？我们的研究表明，插入缺失过程所产生的大量信息具备用于系统发育推断的潜在价值。理想状态下，我们应当能够分别从序列数据与缺失位点数据中得到完全一致的系统发育树（phylogenetic tree）；然而，不同程序所生成的比对结果仍存在显著差异。我们预测，若要充分利用缺失位点所蕴含的信息用于系统发育推断，仍需开发更为精准且计算复杂度更易管控的插入缺失过程模型。

创建时间：

2012-03-23

5,000+

优质数据集

54 个

任务类型

进入经典数据集