Building a Better Fragment Library for De Novo Protein Structure Prediction

Figshare2016-01-15 更新2026-04-29 收录

下载链接：

https://figshare.com/articles/dataset/Building_a_Better_Fragment_Library_for_De_Novo_Protein_Structure_Prediction/1391662

下载链接

链接失效反馈

官方服务：

资源简介：

Fragment-based approaches are the current standard for de novo protein structure prediction. These approaches rely on accurate and reliable fragment libraries to generate good structural models. In this work, we describe a novel method for structure fragment library generation and its application in fragment-based de novo protein structure prediction. The importance of correct testing procedures in assessing the quality of fragment libraries is demonstrated. In particular, the exclusion of homologs to the target from the libraries to correctly simulate a de novo protein structure prediction scenario, something which surprisingly is not always done. We demonstrate that fragments presenting different predominant predicted secondary structures should be treated differently during the fragment library generation step and that exhaustive and random search strategies should both be used. This information was used to develop a novel method, Flib. On a validation set of 41 structurally diverse proteins, Flib libraries presents both a higher precision and coverage than two of the state-of-the-art methods, NNMake and HHFrag. Flib also achieves better precision and coverage on the set of 275 protein domains used in the two previous experiments of the the Critical Assessment of Structure Prediction (CASP9 and CASP10). We compared Flib libraries against NNMake libraries in a structure prediction context. Of the 13 cases in which a correct answer was generated, Flib models were more accurate than NNMake models for 10. “Flib is available for download at: http://www.stats.ox.ac.uk/research/proteins/resources”.

基于片段的方法是当前从头蛋白质结构预测（de novo protein structure prediction）领域的主流标准范式。此类方法需依托精准可靠的片段库（fragment library）以生成高质量的结构模型。本研究提出了一种全新的结构片段库生成方法，并阐述其在基于片段的从头蛋白质结构预测中的应用。本研究验证了采用规范测试流程评估片段库质量的重要性，具体而言，为准确模拟从头蛋白质结构预测的实际场景，需从片段库中剔除与靶蛋白同源的序列——而令人意外的是，这一关键操作并未被普遍执行。研究表明，在片段库生成阶段，需针对具有不同优势预测二级结构的片段采取差异化处理策略，且应同时结合穷举搜索与随机搜索两种方案。基于上述结论，本研究开发了全新的片段库生成工具Flib。在包含41个结构多样性蛋白的验证集上，Flib库的精度（precision）与覆盖度（coverage）均优于两款前沿方法NNMake与HHFrag。在取自此前两项结构预测关键评估（Critical Assessment of Structure Prediction，CASP）实验的275个蛋白质结构域数据集（CASP9与CASP10）上，Flib库同样展现出更优的精度与覆盖度。在结构预测场景下，本研究将Flib库与NNMake库进行了对比测试，在成功生成正确结构模型的13个案例中，有10个案例的Flib预测模型精度优于NNMake模型。Flib工具可从以下网址获取下载：http://www.stats.ox.ac.uk/research/proteins/resources

创建时间：

2016-01-15

5,000+

优质数据集

54 个

任务类型

进入经典数据集