Structure-Based Predictive Models for Allosteric Hot Spots

NIAID Data Ecosystem2026-03-06 收录

下载链接：

https://figshare.com/articles/dataset/Structure_Based_Predictive_Models_for_Allosteric_Hot_Spots/146087

下载链接

链接失效反馈

官方服务：

资源简介：

In allostery, a binding event at one site in a protein modulates the behavior of a distant site. Identifying residues that relay the signal between sites remains a challenge. We have developed predictive models using support-vector machines, a widely used machine-learning method. The training data set consisted of residues classified as either hotspots or non-hotspots based on experimental characterization of point mutations from a diverse set of allosteric proteins. Each residue had an associated set of calculated features. Two sets of features were used, one consisting of dynamical, structural, network, and informatic measures, and another of structural measures defined by Daily and Gray [1]. The resulting models performed well on an independent data set consisting of hotspots and non-hotspots from five allosteric proteins. For the independent data set, our top 10 models using Feature Set 1 recalled 68–81% of known hotspots, and among total hotspot predictions, 58–67% were actual hotspots. Hence, these models have precision P = 58–67% and recall R = 68–81%. The corresponding models for Feature Set 2 had P = 55–59% and R = 81–92%. We combined the features from each set that produced models with optimal predictive performance. The top 10 models using this hybrid feature set had R = 73–81% and P = 64–71%, the best overall performance of any of the sets of models. Our methods identified hotspots in structural regions of known allosteric significance. Moreover, our predicted hotspots form a network of contiguous residues in the interior of the structures, in agreement with previous work. In conclusion, we have developed models that discriminate between known allosteric hotspots and non-hotspots with high accuracy and sensitivity. Moreover, the pattern of predicted hotspots corresponds to known functional motifs implicated in allostery, and is consistent with previous work describing sparse networks of allosterically important residues.

在变构效应（allostery）中，蛋白质单个位点的结合事件会调控远端位点的行为。鉴定出能够在两位点间传递信号的残基仍是一项挑战。我们采用支持向量机（support-vector machines）——一种应用广泛的机器学习方法——开发了预测模型。训练数据集源自一系列多样化变构蛋白的点突变实验表征结果，据此将残基划分为热点残基（hotspots）与非热点残基（non-hotspots）两类。每类残基均配有一组计算得到的特征。我们使用了两类特征集：其一涵盖动态、结构、网络及信息学度量指标；其二则由戴利与格雷[1]所定义的结构特征构成。所得到的模型在独立数据集（independent data set）上表现优异，该数据集包含来自5种变构蛋白的热点残基与非热点残基数据。针对该独立数据集，我们基于特征集1构建的前10个模型可召回68%~81%的已知热点残基，且在所有预测得到的热点残基中，有58%~67%为真实热点残基。因此，这些模型的精确率（precision）P为58%~67%，召回率（recall）R为68%~81%。针对特征集2的对应模型，其精确率（precision）P为55%~59%，召回率（recall）R为81%~92%。我们将两类特征集中能够实现最优预测性能的特征进行了融合。采用该混合特征集的前10个模型的召回率（recall）R为73%~81%，精确率（precision）P为64%~71%，为所有特征集构建的模型中综合性能最佳者。我们的方法成功识别出了已知具有变构意义的结构区域中的热点残基。此外，我们预测的热点残基在蛋白质结构内部形成了连续残基构成的网络，这与此前的研究结果一致。综上，我们开发的模型能够以较高的精度与灵敏度区分已知的变构热点残基与非热点残基。此外，预测得到的热点残基模式与变构效应中涉及的已知功能基序相契合，且与此前描述变构关键残基稀疏网络的研究结果相一致。

创建时间：

2009-10-09

5,000+

优质数据集

54 个

任务类型

进入经典数据集