Site-specific structure and stability constrained substitution models improve phylogenetic inference
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.6wwpzgn2g
下载链接
链接失效反馈官方服务:
资源简介:
In previous studies, we presented site-specific substitution models of protein evolution based on selection on the folding stability of the native state (Stab-CPE), which predict more realistically the evolutionary variability across protein sites. However, those Stab-CPE present qualitative differences from observed data, probably because they ignore changes in the native structure, despite empirical studies suggesting that conservation of the native structure is a stronger selective force than selection on folding stability.
Here we present novel structurally constrained substitution models (Str-CPE) based on Julian Echave's model of the structural change due to a mutation as the linear response of the protein to a perturbation and on the explicit model of the perturbation generated by a specific amino-acid mutation. Compared to our previous Stab-CPE models, the novel Str-CPE models are more stringent (they predict lower sequence entropy and substitution rate), provide higher likelihood to multiple sequence alignments (MSA) that include one or more known structures, and better predict the observed conservation across sites. The models that combine Str-CPE and Stab-CPE models are even more stringent and fit the empirical MSAs better.
We refer collectively to our models as structure and stability constrained substitution models (SSCPE). Importantly in comparison to the traditional empirical substitution models, the SSCPE models infer phylogenetic trees of distantly related proteins more similar to reference trees based on structural information.
We implemented the SSCPE models in the program SSCPE.pl, freely available at https://github.com/ugobas/SSCPE, which infers phylogenetic trees under the SSCPE models with the program RAxML-NG from a concatenated alignment and a list of protein structures that overlap with it.
Methods
The data were generated by the programs tnm (torsional network model, Mendez and Bastolla 2010) and Prot_evol, whose last version is presented in the paper related with the dataset.
创建时间:
2024-07-02



