Data from: ProtASR: an evolutionary framework for ancestral protein reconstruction with selection on folding stability
收藏DataONE2017-01-06 更新2024-06-26 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈官方服务:
资源简介:
The computational reconstruction of ancestral proteins provides information on past biological events and has practical implications for biomedicine and biotechnology. Currently available tools for ancestral sequence reconstruction (ASR) are often based on empirical amino acid substitution models that assume that all sites evolve at the same rate and under the same process. However, this assumption is frequently violated because protein evolution is highly heterogeneous due to different selective constraints among sites. Here, we present ProtASR, a new evolutionary framework to infer ancestral protein sequences accounting for selection on protein stability. First, ProtASR generates site-specific substitution matrices through the structurally constrained mean-field substitution model (MF), which considers both unfolding and misfolding stability. We previously showed that MF models outperform empirical amino acid substitution models, as well as other structurally constrained substitution models, both in terms of likelihood and correctly inferring amino acid distributions across sites. In the second step, ProtASR adapts a well-established maximum-likelihood (ML) ASR procedure to infer ancestral proteins under MF models. A known bias of ML ASR methods is that they tend to overestimate the stability of ancestral proteins by under-estimating the frequency of deleterious mutations. We compared ProtASR under MF to two empirical substitution models (JTT and CAT), reconstructing the ancestral sequences of simulated proteins. ProtASR yields reconstructed proteins with less biased stabilities, which are significantly closer to those of the simulated proteins. Analysis of extant protein families suggests that folding stability evolves through time across protein families, potentially reflecting neutral fluctuation. Some families exhibit a more constant protein folding stability, while others are more variable. ProtASR is freely available from https://github.com/miguelarenas/protasr and includes detailed documentation and ready-to-use examples. It runs in seconds/minutes depending on protein length and alignment size.
祖先蛋白质的计算重建可提供过往生物学事件的相关信息,同时在生物医学与生物技术领域具备实际应用价值。
当前已有的祖先序列重建(Ancestral Sequence Reconstruction, ASR)工具,通常基于经验性氨基酸替换模型,这类模型假设所有演化位点以相同速率演化并遵循同一演化过程。
然而这一假设往往并不成立,因为蛋白质演化存在高度异质性,不同位点受到的选择约束存在显著差异。
本文提出ProtASR——一种全新的演化分析框架,可在考虑蛋白质稳定性选择压力的前提下推断祖先蛋白质序列。
首先,ProtASR通过结构约束型平均场替换模型(Structurally Constrained Mean-Field Substitution Model, MF)生成位点特异性替换矩阵,该模型同时兼顾蛋白质的展开稳定性与错配折叠稳定性。
我们此前的研究已证实,无论是在似然值拟合精度还是位点间氨基酸分布的预测准确性上,平均场模型均优于经验性氨基酸替换模型以及其他结构约束型替换模型。
其次,ProtASR适配了一套成熟的最大似然(Maximum Likelihood, ML)祖先序列重建流程,可基于平均场模型完成祖先蛋白质的推断。
已知最大似然祖先序列重建方法存在一项固有偏差:其往往会低估有害突变的发生频率,进而高估祖先蛋白质的稳定性。
我们以模拟蛋白质的祖先序列重建任务为基准,将采用平均场模型的ProtASR与两类经典经验性替换模型(JTT与CAT)进行了对比实验。
实验结果表明,ProtASR所重建的蛋白质稳定性偏差更低,其重建结果与模拟蛋白质的真实稳定性更为接近,且二者差异具备统计学显著性。
对现存蛋白质家族的演化分析显示,不同蛋白质家族的折叠稳定性随时间发生动态变化,这一现象可能反映了中性波动的作用。
部分蛋白质家族的折叠稳定性维持相对恒定,而另一些家族的稳定性则表现出更为明显的波动。
ProtASR可通过https://github.com/miguelarenas/protasr免费获取,且附带详细的官方文档与可直接运行的示例代码。
程序的运行耗时仅为数秒至数分钟,具体取决于蛋白质序列长度与多序列比对的规模。
创建时间:
2017-01-06



