Data from: Modeling site heterogeneity with posterior mean site frequency profiles accelerates accurate phylogenomic estimation
收藏DataCite Commons2025-06-01 更新2025-04-09 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.gv1q5
下载链接
链接失效反馈官方服务:
资源简介:
Proteins have distinct structural and functional constraints at different
sites that lead to site-specific preferences for particular amino acid
residues as the sequences evolve. Heterogeneity in the amino acid
substitution process between sites is not modeled by commonly used
empirical amino acid exchange matrices. Such model misspecification can
lead to artefacts in phylogenetic estimation such as long-branch
attraction. Although sophisticated site-heterogeneous mixture models have
been developed to address this problem in both Bayesian and maximum
likelihood (ML) frameworks, their formidable computational time and memory
usage severely limits their use in large phylogenomic analyses. Here we
propose a posterior mean site frequency (PMSF) method as a rapid and
efficient approximation to full empirical profile mixture models for ML
analysis. The PMSF approach assigns a conditional mean amino acid
frequency profile to each site calculated based on a mixture model fitted
to the data using a preliminary guide tree. These PMSF profiles can then
be used for in-depth tree-searching in place of the full mixture model.
Compared with widely used empirical mixture models with k classes, our
implementation of PMSF in IQ-TREE (http://www.iqtree.org) speeds up the
computation by approximately k /1.5-fold and requires a small fraction of
the RAM. Furthermore, this speedup allows, for the first time, full
nonparametric bootstrap analyses to be conducted under complex
site-heterogeneous models on large concatenated data matrices. Our
simulations and empirical data analyses demonstrate that PMSF can
effectively ameliorate long-branch attraction artefacts. In some empirical
and simulation settings PMSF provided more accurate estimates of
phylogenies than the mixture models from which they derive.
提供机构:
Dryad
创建时间:
2017-08-04



