five

Data for: Gene tree estimation error with ultraconserved elements: An empirical study on Pseudapis bees

收藏
Mendeley Data2024-05-10 更新2024-06-27 收录
下载链接:
https://zenodo.org/records/4282988
下载链接
链接失效反馈
官方服务:
资源简介:
Summarizing individual gene trees to species phylogenies using two-step coalescent methods is now a standard strategy in the field of phylogenomics. However, practical implementations of summary methods suffer from gene tree estimation error, which is caused by various biological and analytical factors. Greatly understudied is the choice of gene tree inference method and downstream effects on species tree estimation for empirical data sets. To better understand the impact of this method choice on gene and species tree accuracy, we compare gene trees estimated through four widely used programs under different model-selection criteria: PhyloBayes, MrBayes, IQ-Tree and RAxML. We study their performance in the phylogenomic framework of > 800 ultraconserved elements from the bee subfamily Nomiinae (Halictidae). Our taxon sampling focuses on the genus Pseudapis, a distinct lineage with diverse morphological features, but contentious morphology-based taxonomic classifications and no molecular phylogenetic guidance. We approximate topological accuracy of gene trees by assessing their ability to recover two uncontroversial, monophyletic groups, and compare branch lengths of individual trees using the stemminess metric (the relative length of internal branches). We further examine different strategies of removing uninformative loci and the collapsing of weakly supported nodes into polytomies. We then summarize gene trees with ASTRAL and compare resulting species phylogenies, including comparisons to concatenation-based estimates. Gene trees obtained with the reversible jump model search in MrBayes were most concordant on average and all Bayesian methods yielded gene trees with better stemminess values. The only gene tree estimation approach whose ASTRAL summary trees consistently produced the most likely correct topology, however, was IQ-Tree with automated model designation (MFP). We discuss these findings and provide practical advice on gene tree estimation for summary methods. Lastly, we establish the first phylogeny-informed classification for Pseudapis s. l. and map the distribution of distinct morphological features of the group.

利用两步溯祖方法(coalescent methods)将单个基因树(gene tree)汇总为物种系统发育树(species phylogeny),现已成为系统发育基因组学(phylogenomics)领域的主流研究策略。然而,汇总方法的实际应用常受限于基因树估计误差,该误差由多种生物学与分析学因素共同导致。针对实证数据集,基因树推断方法的选择及其对物种树估计的后续影响,却长期未得到充分研究。为明晰该方法选择对基因树与物种树精度的影响,我们对比了四款常用软件在不同模型选择标准下所估计的基因树:PhyloBayes、MrBayes、IQ-Tree及RAxML。本研究以隧蜂科(Halictidae)隧蜂亚科(Nomiinae)的800余个超保守元素(ultraconserved elements)为系统发育基因组学研究框架,评估上述软件的性能表现。本研究的类群采样聚焦于拟隧蜂属(Pseudapis):该类群是一支形态特征多样的独特演化支,但基于形态学的分类学界定尚存争议,且目前尚无分子系统发育相关研究作为参考。我们通过评估基因树恢复两个无争议单系群(monophyletic groups)的能力,来近似衡量基因树的拓扑结构精度;同时利用茎部紧实度(stemminess)指标——即内部分支的相对长度——比较单棵基因树的分支长度。我们还进一步探究了两种不同策略:剔除无信息位点,以及将支持度较弱的节点压缩为多歧节点(polytomies)。随后我们使用ASTRAL软件对基因树进行汇总,并对比由此得到的物种系统发育树,同时还与基于联配拼接法(concatenation)的估计结果进行了比较。通过MrBayes中的可逆跳变模型搜索(reversible jump model search)得到的基因树,在平均一致性上表现最优;且所有贝叶斯推断方法(Bayesian methods)所得到的基因树,其茎部紧实度指标均更优异。然而,唯一一款其ASTRAL汇总树能够始终如一地得到最接近真实拓扑结构的基因树推断方法,是搭载自动模型选择功能(MFP)的IQ-Tree。我们对上述研究结果展开讨论,并为汇总方法中的基因树推断环节提供实操性建议。最后,我们首次建立了基于系统发育信息的拟隧蜂广义类群(Pseudapis s. l.)分类框架,并绘制了该类群独特形态特征的分布图谱。
创建时间:
2023-06-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作