Efficient Sampling in Fragment-Based Protein Structure Prediction Using an Estimation of Distribution Algorithm

NIAID Data Ecosystem2026-03-07 收录

下载链接：

https://figshare.com/articles/dataset/_Efficient_Sampling_in_Fragment_Based_Protein_Structure_Prediction_Using_an_Estimation_of_Distribution_Algorithm_/755599

下载链接

链接失效反馈

官方服务：

资源简介：

Fragment assembly is a powerful method of protein structure prediction that builds protein models from a pool of candidate fragments taken from known structures. Stochastic sampling is subsequently used to refine the models. The structures are first represented as coarse-grained models and then as all-atom models for computational efficiency. Many models have to be generated independently due to the stochastic nature of the sampling methods used to search for the global minimum in a complex energy landscape. In this paper we present , a fragment-based approach which shares information between the generated models and steers the search towards native-like regions. A distribution over fragments is estimated from a pool of low energy all-atom models. This iteratively-refined distribution is used to guide the selection of fragments during the building of models for subsequent rounds of structure prediction. The use of an estimation of distribution algorithm enabled to reach lower energy levels and to generate a higher percentage of near-native models. uses an all-atom energy function and produces models with atomic resolution. We observed an improvement in energy-driven blind selection of models on a benchmark of in comparison with the AbInitioRelax protocol.

片段组装（Fragment assembly）是一种强有力的蛋白质结构预测方法，其从已知蛋白质结构衍生的候选片段库中选取片段以构建蛋白质模型。随后通过随机采样（Stochastic sampling）对所构建的模型进行精修优化。为提升计算效率，蛋白质结构首先以粗粒度模型（coarse-grained model）进行表征，随后转换为全原子模型（all-atom model）。由于用于在复杂能量景观中搜索全局最小值的采样方法具有随机性，因此需独立生成大量模型。本文提出了一种基于片段组装的方法，该方法可在已生成的模型间共享信息，并引导搜索过程朝向类天然结构区域推进。研究人员从低能量全原子模型库中估算得到片段分布概率，该经迭代优化的分布将用于指导后续结构预测轮次的模型构建阶段中的片段选取工作。分布估计算法（Estimation of distribution algorithm）的应用使得该方法能够达到更低的能量水平，并生成更高比例的近天然结构模型。该方法采用全原子能量函数，可生成具有原子分辨率的蛋白质结构模型。相较于AbInitioRelax协议，本文在某基准测试集上观察到，该方法在能量驱动的模型盲选任务中性能得到了提升。

创建时间：

2013-07-25

5,000+

优质数据集

54 个

任务类型

进入经典数据集