Data from: Phylogenomics of Lophotrochozoa with consideration of systematic error
收藏DataONE2016-08-29 更新2024-06-26 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈官方服务:
资源简介:
Phylogenomic studies have improved understanding of deep metazoan phylogeny and show promise for resolving incongruences among analyses based on limited numbers of loci. One region of the animal tree that has been especially difficult to resolve, even with phylogenomic approaches, is relationships within Lophotrochozoa (the animal clade that includes molluscs, annelids, and flatworms among others). Lack of resolution in phylogenomic analyses could be due to insufficient phylogenetic signal, limitations in taxon and/or gene sampling, or systematic error. Here, we investigated why lophotrochozoan phylogeny has been such a difficult question to answer by identifying and reducing sources of systematic error. We supplemented existing data with 32 new transcriptomes spanning the diversity of Lophotrochozoa and constructed a new set of Lophotrochozoa-specific core orthologs. Of these, 638 orthologous groups (OGs) passed strict screening for paralogy using a tree-based approach. In order to reduce possible sources of systematic error, we calculated branch-length heterogeneity, evolutionary rate, percent missing data, compositional bias, and saturation for each OG and analyzed increasingly stricter subsets of only the most stringent (best) OGs for these five variables. Principal component analysis of the values for each factor examined for each OG revealed that compositional heterogeneity and average patristic distance contributed most to the variance observed along the first principal component while branch-length heterogeneity and, to a lesser extent, saturation contributed most to the variance observed along the second. Missing data did not strongly contribute to either. Additional sensitivity analyses examined effects of removing taxa with heterogeneous branch lengths, large amounts of missing data, and compositional heterogeneity. Although our analyses do not unambiguously resolve lophotrochozoan phylogeny, we advance the field by reducing the list of viable hypotheses. Moreover, our systematic approach for dissection of phylogenomic data can be applied to explore sources of incongruence and poor support in any phylogenomic dataset.
系统发育基因组学研究(phylogenomic studies)显著提升了学界对后生动物深层系统发育(deep metazoan phylogeny)的认知,并为解决基于有限基因座(loci)数量的分析中存在的拓扑冲突问题展现了应用潜力。即便是借助系统发育基因组学手段,动物演化树中仍有一个区域始终难以解析:冠轮动物(Lophotrochozoa,涵盖软体动物、环节动物、扁形动物等多个类群的动物演化支)内部的演化关系。系统发育基因组学分析中解析度不足的成因可能包括:系统发育信号匮乏、分类群与/或基因采样存在局限,或是系统误差(systematic error)。
本研究通过识别并削减系统误差的来源,探究了冠轮动物系统发育长期难以解析的核心原因。我们通过补充32个覆盖冠轮动物类群多样性的全新转录组(transcriptomes)数据,构建了一套专属冠轮动物的核心同源基因(core orthologs)集。其中,638个同源基因簇(orthologous groups, OGs)通过了基于树的方法(tree-based approach)开展的严格旁系同源(paralogy)筛选。
为进一步降低潜在系统误差的影响,我们针对每个OG计算了支长异质性(branch-length heterogeneity)、进化速率(evolutionary rate)、缺失数据占比(percent missing data)、组成偏倚(compositional bias)与替换饱和(saturation)情况,并基于上述五个变量,逐步筛选更为严格的最优OG子集开展分析。对每个OG的各检测因子值开展主成分分析(principal component analysis)后结果显示:组成异质性与平均支系距离(patristic distance)对第一主成分的方差贡献最大;支长异质性以及程度稍弱的替换饱和则对第二主成分的方差贡献最为显著,而缺失数据对两个主成分的方差均无显著贡献。
额外开展的敏感性分析(sensitivity analyses)检验了移除支长异质性类群、高缺失数据类群与存在组成偏倚类群所带来的影响。尽管本研究并未完全解析冠轮动物的系统发育关系,但我们通过缩小可行假说(viable hypotheses)的范围,推动了该领域的研究进展。此外,本研究用于拆解系统发育基因组学数据的系统性分析框架,可被应用于探索任意系统发育基因组数据集中拓扑冲突与支持度不足的成因。
创建时间:
2016-08-29



