Generation of Pairwise Potentials Using Multidimensional Data Mining
收藏NIAID Data Ecosystem2026-03-10 收录
下载链接:
https://figshare.com/articles/dataset/Generation_of_Pairwise_Potentials_Using_Multidimensional_Data_Mining/7128122
下载链接
链接失效反馈官方服务:
资源简介:
The
rapid development of molecular structural databases provides
the chemistry community access to an enormous array of experimental
data that can be used to build and validate computational models.
Using radial distribution functions collected from experimentally
available X-ray and NMR structures, a number of so-called statistical
potentials have been developed over the years using the structural
data mining strategy. These potentials have been developed within
the context of the two-particle Kirkwood equation by extending its
original use for isotropic monatomic systems to anisotropic biomolecular
systems. However, the accuracy and the unclear physical meaning of
statistical potentials have long formed the central arguments against
such methods. In this work, we present a new approach to generate
molecular energy functions using structural data mining. Instead of
employing the Kirkwood equation and introducing the “reference
state” approximation, we model the multidimensional probability
distributions of the molecular system using graphical models and generate
the target pairwise Boltzmann probabilities using the Bayesian field
theory. Different from the current statistical potentials that mimic
the “knowledge-based” PMF based on the 2-particle Kirkwood
equation, the graphical-model-based structure-derived potential developed
in this study focuses on the generation of lower-dimensional Boltzmann
distributions of atoms through reduction of dimensionality. We have
named this new scoring function GARF, and in this work we focus on
the mathematical derivation of our novel approach followed by validation
studies on its ability to predict protein–ligand interactions.
分子结构数据库的迅猛发展,为化学领域科研群体提供了海量实验数据,可用于构建并验证计算模型。多年来,研究人员依托从实验获取的X射线与核磁共振(NMR)结构中采集的径向分布函数(radial distribution functions),通过结构数据挖掘策略开发出了诸多所谓的统计势能。此类势能的开发均基于双粒子柯尔伍德(Kirkwood)方程,即将其原本用于各向同性单原子系统的应用场景拓展至各向异性生物分子系统。然而,统计势能的准确性不足与物理意义模糊,长期以来都是这类方法遭受质疑的核心争议点。本研究提出了一种基于结构数据挖掘的全新方法,用于生成分子能量函数。该方法并未采用柯尔伍德方程与"参考态近似"手段,而是通过图模型(graphical models)对分子系统的多维概率分布进行建模,并借助贝叶斯场理论(Bayesian field theory)生成目标成对玻尔兹曼概率。与当前基于双粒子柯尔伍德方程、模仿"基于知识"的平均力势(PMF, Potential of Mean Force)的现有统计势能不同,本研究开发的基于图模型的结构衍生势能,聚焦于通过降维手段生成原子的低维玻尔兹曼分布。我们将这一全新评分函数命名为GARF,本研究首先阐述该新方法的数学推导过程,随后通过验证实验评估其预测蛋白质-配体相互作用的能力。
创建时间:
2018-09-05



