five

Data from: The effects of model choice and mitigating bias on the ribosomal tree of life

收藏
DataONE2013-06-13 更新2024-06-27 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈
官方服务:
资源简介:
Deep-level relationships within Bacteria, Archaea, and Eukarya as well as the relationships of these three domains to each other require resolution. The ribosomal machinery, universal to all cellular life, represents a protein repertoire resistant to horizontal gene transfer, which provides a largely congruent signal necessary for reconstructing a tree suitable as a backbone for life’s reticulate history. Here, we generate a ribosomal tree of life from a robust taxonomic sampling of Bacteria, Archaea, and Eukarya to elucidate deep-level intra-domain and inter-domain relationships. Lack of phylogenetic information and systematic errors caused by inadequate models (that cannot account for substitution rate or compositional heterogeneities) or improper model selection compound conflicting phylogenetic signals from HGT and/or paralogy. Thus, we tested several models of varying sophistication on three different datasets, performed removal of fast-evolving or long-branched Archaea and Eukarya, and employed three different strategies to remove compositional heterogeneity to examine their effects on the topological outcome. Our results support a two-domain topology for the tree of life, where Eukarya emerges from within Archaea as sister to a Korarchaeota/Thaumarchaeota (KT) or Crenarchaeota/KT clade for all models under all or at least one of the strategies employed. Taxonomic manipulation allows single-matrix and certain mixture models to vacillate between two-domain and three-domain phylogenies. We find that models vary in their ability to resolve different areas of the tree of life, which does not necessarily correlate with model complexity. For example, both single-matrix and some mixture models recover monophyletic Crenarchaeota and Euryarchaeota archaeal phyla. In contrast, the most sophisticated model recovers a paraphyletic Euryarchaeota but detects two large clades that comprise the Bacteria, which were recovered separately but never together in the other models. Overall, models recovered consistent topologies despite dataset modifications due to the removal of compositional bias, which reflects either ineffective bias reduction or robust datasets that allow models to overcome reconstruction artifacts. We recommend a comparative approach for evolutionary models to identify model weaknesses as well as consensus relationships.

细菌域(Bacteria)、古菌域(Archaea)与真核生物域(Eukarya)内部的深层演化关系,以及这三个域之间的相互亲缘关系,仍有待厘清。作为所有细胞生命共有的核糖体装置(ribosomal machinery),其编码的蛋白组不易发生水平基因转移(horizontal gene transfer, HGT),因此携带了高度一致的演化信号,这为构建适用于生命网状演化历史的主干系统发育树提供了必要支撑。本研究通过对细菌域、古菌域和真核生物域的广泛分类学采样,构建了生命核糖体系统发育树,以阐明各域内部及域间的深层亲缘关系。系统发育信息的缺失,以及因模型不足(无法适配替换速率或组成异质性)或模型选择不当所引发的系统性误差,加剧了来自水平基因转移或旁系同源(paralogy)的冲突性系统发育信号。为此,我们针对三类不同数据集测试了多种复杂度各异的演化模型,移除了进化速率较快或长分支(long-branched)的古菌与真核生物样本,并采用三种不同策略去除组成异质性,以探究这些操作对系统发育树拓扑结果的影响。研究结果支持生命树的两域拓扑结构(two-domain topology):在所有或至少部分测试策略下,所有模型均显示真核生物域起源于古菌域内部,且与初古菌门(Korarchaeota)/奇古菌门(Thaumarchaeota, KT)或泉古菌门(Crenarchaeota)/KT演化支构成姐妹群。分类学操作可使单矩阵模型(single-matrix model)与部分混合模型(mixture model)在两域与三域系统发育拓扑结构间产生摇摆。我们发现,不同模型对生命树不同区域的解析能力存在差异,这未必与模型复杂度呈正相关。例如,单矩阵模型与部分混合模型均可将古菌门下的泉古菌门与广古菌门(Euryarchaeota)恢复为单系群(monophyletic)。与之相反,复杂度最高的模型则恢复出并系的广古菌门,同时识别出两大包含细菌的演化支——这两个支系在其他模型中虽分别被恢复,但从未被同时归入同一单系群。总体而言,尽管通过去除组成偏差(compositional bias)对数据集进行了修改,多数模型仍能得到一致的拓扑结构,这要么反映出偏差去除手段效果有限,要么说明数据集本身足够稳健,可帮助模型克服重建伪影(reconstruction artifacts)。我们建议采用比较研究方法分析演化模型,以识别模型的局限性,并确定共识性的亲缘关系。
创建时间:
2013-06-13
二维码
社区交流群
二维码
科研交流群
商业服务