five

Data from: The identification of the closest living relative(s) of tetrapods: phylogenomic lessons for resolving short ancient internodes

收藏
DataONE2016-06-09 更新2024-06-26 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈
官方服务:
资源简介:
Identifying the closest living relative(s) of tetrapods is an important, yet still contested question in vertebrate phylogenetics. Three hypotheses are possible and ruling out alternatives has proven difficult even with large molecular data sets due to weak phylogenetic signal coupled non-phylogenetic noise resulting from relatively rapid speciation events that occurred a long time ago (>400 Ma.). Here, we revisit the identity of the closest living relative(s) of land vertebrates from a phylogenomic perspective and include new genomic data for all extant lungfish genera. RNA-seq proves to be a great alternative to genomic sequencing, which currently is technically not feasible in lungfishes due to their huge (50-130 Gb) and repetitive genomes. We examined the most important sources of systematic error, namely long-branch attraction, compositional heterogeneity and distribution of missing data and applied different correction techniques. A multispecies coalescent approach is used to account for deep coalescence that might come from the short and deep internodes separating early sarcopterygian splits. Concatenation methods favored lungfishes as the closest living relatives of tetrapods with strong statistical support. Amino acid profile mixture models can unambiguously resolve this difficult internode thanks to their ability of reducing systematic error. We assessed the performance of different site-heterogeneous models and data partitioning and compared the ability of different strategies designed to overcome long-branch attraction, including taxon manipulation, reduction of among-lineage rate heterogeneity and removal of fast-evolving or compositionally heterogeneous positions. The identification of lungfish as sister group of tetrapods is robust regarding the effects of non-stationary composition and distribution of missing data. The multispecies coalescent method reconstructed strongly supported topologies that were congruent with concatenation, despite pervasive gene tree heterogeneity. We reject alternative topologies for early sarcopterygian relationships by increasing the signal-to-noise ratio in our alignments. The analytical pipeline outlined here combines probabilistic phylogenomic inference with methods for evaluating data quality, model adequacy and assessing systematic error, and thus is likely to help resolve similarly difficult internodes in the tree of life.

确定四足类(tetrapods)现存最近亲缘类群是脊椎动物系统发育学领域一个重要且迄今仍存在争议的科学问题。目前存在三种可供验证的假说,即便借助大规模分子数据集,排除其他假说也极具挑战——这是由于距今超过4亿年(>400 Ma.)的快速物种形成事件所产生的系统发育信号薄弱,且伴随非系统发育噪声所致。 本研究从系统发育基因组学(phylogenomic)视角重新探讨陆生脊椎动物现存最近亲缘类群的身份,并纳入了所有现存肺鱼(lungfish)属的全新基因组数据。RNA测序(RNA-seq)被证实是基因组测序的优质替代方案——当前肺鱼因其庞大(50~130 Gb)且重复序列丰富的基因组,无法完成常规基因组测序。 本研究考察了系统误差最主要的三类来源:长枝吸引(long-branch attraction)、组成异质性以及缺失数据分布模式,并采用了多种校正方法。针对早期肉鳍鱼类(sarcopterygian)分化所形成的短而深的内部节点可能引发的深度溯祖现象,本研究采用多物种溯祖模型(multispecies coalescent)进行校正。 串联法(concatenation methods)分析得到了强有力的统计支持,显示肺鱼为四足类的现存最近亲缘类群。氨基酸谱混合模型可通过降低系统误差,明确解析这一棘手的内部节点问题。 本研究评估了不同位点异质性模型与数据分区策略的表现,并对比了多种克服长枝吸引的方案的效能,包括类群调整、降低谱系间速率异质性,以及剔除快速进化位点或组成异质性位点。在非平稳组成与缺失数据分布的影响下,肺鱼作为四足类姊妹群的结论仍保持稳健。 尽管普遍存在基因树异质性现象,多物种溯祖模型仍重建出了与串联法结果一致且具有强统计支持的拓扑结构。本研究通过提升序列比对的信噪比,排除了早期肉鳍鱼类亲缘关系的其他拓扑假说。 本研究提出的分析流程将概率性系统发育基因组学推断方法,与数据质量评估、模型适配性检验及系统误差评估手段相结合,有望为解析生命之树中其他类似的棘手内部节点问题提供参考。
创建时间:
2016-06-09
二维码
社区交流群
二维码
科研交流群
商业服务