Phylogenomic branch length estimation using quartets
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.pg4f4qs3q
下载链接
链接失效反馈官方服务:
资源简介:
Branch lengths and topology of a species tree are essential in most downstream analyses, including estimation of diversification dates, characterization of selection, understanding adaptation, and comparative genomics. Modern phylogenomic analyses often use methods that account for the heterogeneity of evolutionary histories across the genome due to processes such as incomplete lineage sorting. However, these methods typically do not generate branch lengths in units that are usable by downstream applications, forcing phylogenomic analyses to resort to alternative shortcuts such as estimating branch lengths by concatenating gene alignments into a supermatrix. Yet, concatenation and other available approaches for estimating branch lengths fail to address heterogeneity across the genome. In this article, we derive expected values of gene tree branch lengths in substitution units under an extension of the multispecies coalescent (MSC) model that allows substitutions with varying rates across the species tree. We present CASTLES, a new technique for estimating branch lengths on the species tree from estimated gene trees that uses these expected values, and our study shows that CASTLES improves on the most accurate prior methods with respect to both speed and accuracy.
物种树的分支长度与拓扑结构,在多数下游分析中均为核心基础,包括分化时间估算、选择特征解析、适应性演化机制阐释以及比较基因组学研究等。现代系统发育基因组学分析常采用可考量基因组内演化历史异质性的方法,此类异质性由不完全谱系分选(incomplete lineage sorting)等演化过程所导致。然而,此类方法通常无法生成下游应用可直接使用的标准化单位分支长度,迫使系统发育基因组学分析转而采用替代捷径,例如通过将基因比对序列拼接为超级矩阵以估算分支长度。但拼接法与其他现有分支长度估算方法,均无法解决基因组内的演化异质性问题。本文基于拓展后的多物种溯祖模型(multispecies coalescent, MSC)——该模型允许物种树各分支存在差异化替换速率——推导出了以替换单位计量的基因树分支长度期望值。我们据此提出CASTLES方法,这是一种基于已估算基因树、利用上述期望值来估算物种树分支长度的全新技术;研究结果表明,CASTLES在运行速度与估算精度两方面,均优于当前最为精准的同类方法。
创建时间:
2025-12-03



