five

Contact prediction is hardest for the most informative contacts, but improves with the incorporation of contact potentials

收藏
NIAID Data Ecosystem2026-03-10 收录
下载链接:
https://figshare.com/articles/dataset/Contact_prediction_is_hardest_for_the_most_informative_contacts_but_improves_with_the_incorporation_of_contact_potentials/6719312
下载链接
链接失效反馈
官方服务:
资源简介:
Co-evolution between pairs of residues in a multiple sequence alignment (MSA) of homologous proteins has long been proposed as an indicator of structural contacts. Recently, several methods, such as direct-coupling analysis (DCA) and MetaPSICOV, have been shown to achieve impressive rates of contact prediction by taking advantage of considerable sequence data. In this paper, we show that prediction success rates are highly sensitive to the structural definition of a contact, with more permissive definitions (i.e., those classifying more pairs as true contacts) naturally leading to higher positive predictive rates, but at the expense of the amount of structural information contributed by each contact. Thus, the remaining limitations of contact prediction algorithms are most noticeable in conjunction with geometrically restrictive contacts—precisely those that contribute more information in structure prediction. We suggest that to improve prediction rates for such “informative” contacts one could combine co-evolution scores with additional indicators of contact likelihood. Specifically, we find that when a pair of co-varying positions in an MSA is occupied by residue pairs with favorable statistical contact energies, that pair is more likely to represent a true contact. We show that combining a contact potential metric with DCA or MetaPSICOV performs considerably better than DCA or MetaPSICOV alone, respectively. This is true regardless of contact definition, but especially true for stricter and more informative contact definitions. In summary, this work outlines some remaining challenges to be addressed in contact prediction and proposes and validates a promising direction towards improvement.

同源蛋白质多序列比对(multiple sequence alignment, MSA)中残基对之间的共进化现象,长期以来被视作结构接触的指示性特征。近年来,直接耦合分析(direct-coupling analysis, DCA)、MetaPSICOV等多种方法借助海量序列数据,已在接触预测任务中取得了出色的预测效果。本文研究表明,接触预测的成功率对接触的结构定义具有高度敏感性:更宽松的接触定义(即归类为真实接触的残基对更多)自然可获得更高的阳性预测率,但代价是每个接触所贡献的结构信息总量有所下降。因此,接触预测算法现存的局限性,在几何约束严格的接触——也就是那些在结构预测中贡献更多信息的接触类型——上体现得尤为显著。我们提出,若要提升这类"高信息价值"接触的预测准确率,可将共进化得分与额外的接触可能性指标相结合。具体而言,研究发现:当MSA中一对共变异位点对应的残基对具备有利的统计接触能量时,该残基对更有可能为真实的结构接触。实验结果显示,将接触势能度量指标与DCA或MetaPSICOV相结合,其性能分别显著优于单独使用DCA或MetaPSICOV的模型。这一结论不受接触定义的影响,且对于更严格、更具信息价值的接触定义而言,性能提升效果尤为突出。综上,本研究梳理了接触预测领域尚未解决的若干挑战,并提出并验证了一条极具潜力的性能优化方向。
创建时间:
2018-06-28
二维码
社区交流群
二维码
科研交流群
商业服务