Membrane protein contact and structure prediction using co-evolution in conjunction with machine learning

Figshare2017-05-24 更新2026-04-29 收录

下载链接：

https://figshare.com/articles/dataset/Membrane_protein_contact_and_structure_prediction_using_co-evolution_in_conjunction_with_machine_learning/5037548

下载链接

链接失效反馈

官方服务：

资源简介：

De novo membrane protein structure prediction is limited to small proteins due to the conformational search space quickly expanding with length. Long-range contacts (24+ amino acid separation)–residue positions distant in sequence, but in close proximity in the structure, are arguably the most effective way to restrict this conformational space. Inverse methods for co-evolutionary analysis predict a global set of position-pair couplings that best explain the observed amino acid co-occurrences, thus distinguishing between evolutionarily explained co-variances and these arising from spurious transitive effects. Here, we show that applying machine learning approaches and custom descriptors improves evolutionary contact prediction accuracy, resulting in improvement of average precision by 6 percentage points for the top 1L non-local contacts. Further, we demonstrate that predicted contacts improve protein folding with BCL::Fold. The mean RMSD100 metric for the top 10 models folded was reduced by an average of 2 Å for a benchmark of 25 membrane proteins.

由于构象搜索空间随蛋白质长度快速扩张，从头折叠（de novo）膜蛋白结构预测目前仅局限于小型蛋白。长程接触（Long-range contacts，即氨基酸序列间隔24个以上、但在三维结构中紧密相邻的残基位点）可以说是限制该构象搜索空间的最有效手段。共进化分析（co-evolutionary analysis）的逆方法可预测一组全局的位点配对耦合关系，该组关系能够最优解释观测到的氨基酸共现现象，从而区分进化意义上的共变异与虚假传递效应带来的共变异。本研究表明，结合机器学习方法与自定义描述符可提升进化接触预测的准确率，使前1倍蛋白长度数量的非局部接触的平均精确率提升6个百分点。此外，本研究还证实，预测得到的接触信息可借助BCL::Fold优化蛋白质折叠效果：在包含25个膜蛋白的基准测试集上，排名前10的折叠所得模型的平均RMSD100指标平均降低了2埃（Å）。

创建时间：

2017-05-24

5,000+

优质数据集

54 个

任务类型

进入经典数据集