five

Uneven missing data skew phylogenomic relationships within the lories and lorikeets

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.n5tb2rbsp
下载链接
链接失效反馈
官方服务:
资源简介:
Inlcuded is the supplementary data for Smith, B. T., Mauck, W. M., Benz, B., & Andersen, M. J. (2018). Uneven missing data skews phylogenomic relationships within the lories and lorikeets. BioRxiv, 398297.  The resolution of the Tree of Life has accelerated with advances in DNA sequencing technology. To achieve dense taxon sampling, it is often necessary to obtain DNA from historical museum specimens to supplement modern genetic samples. However, DNA from historical material is generally degraded, which presents various challenges. In this study, we evaluated how the coverage at variant sites and missing data among historical and modern samples impacts phylogenomic inference. We explored these patterns in the brush-tongued parrots (lories and lorikeets) of Australasia by sampling ultraconserved elements in 105 taxa. Trees estimated with low coverage characters had several clades where relationships appeared to be influenced by whether the sample came from historical or modern specimens, which were not observed when more stringent filtering was applied. To assess if the topologies were affected by missing data, we performed an outlier analysis of sites and loci, and a data reduction approach where we excluded sites based on data completeness. Depending on the outlier test, 0.15% of total sites or 38% of loci were driving the topological differences among trees, and at these sites, historical samples had 10.9x more missing data than modern ones. In contrast, 70% data completeness was necessary to avoid spurious relationships. Predictive modeling found that outlier analysis scores were correlated with parsimony informative sites in the clades whose topologies changed the most by filtering. After accounting for biased loci and understanding the stability of relationships, we inferred a more robust phylogenetic hypothesis for lories and lorikeets.

本数据集为Smith BT、Mauck WM、Benz B及Andersen MJ(2018)于生物预印本平台(BioRxiv)发表的《非均匀缺失数据扭曲吸蜜鹦鹉类群的系统发育组学关系》一文的补充数据。生命之树的分辨率随着DNA测序技术的进步而持续提升。为实现密集的类群采样,通常需要从历史博物馆标本中获取DNA,以补充现代遗传样本。然而,历史标本的DNA普遍存在降解情况,由此带来诸多研究挑战。本研究评估了历史与现代样本的变异位点覆盖度及缺失数据水平对系统发育组学推断的影响。我们以澳大拉西亚地区的刷舌鹦鹉(吸蜜鹦鹉类)为研究对象,对105个类群的超保守元件(ultraconserved elements)进行采样分析。基于低覆盖度特征构建的系统发育树存在多个分支,其拓扑关系似乎受样本来源(历史/现代标本)的影响,而采用更严格的过滤标准时则未观察到该现象。为评估拓扑结构是否受缺失数据干扰,我们针对位点与基因座开展了异常值分析,并采用基于数据完整性剔除位点的数据缩减方法。根据不同的异常值检验方法,总位点的0.15%或38%的基因座是导致不同系统发育树拓扑结构差异的核心驱动因素;在这些位点上,历史样本的缺失数据量是现代样本的10.9倍。与之相对,需达到70%的数据完整性才能避免生成虚假的系统发育关系。预测模型分析显示,异常值分析得分与那些经过滤后拓扑结构变化最显著的分支中的简约信息位点(parsimony informative sites)数量呈显著相关。在校正偏倚基因座并明确类群关系稳定性的基础上,我们为吸蜜鹦鹉类群推断出了更为稳健的系统发育假说。
创建时间:
2022-07-21
二维码
社区交流群
二维码
科研交流群
商业服务