five

Generation of Pairwise Potentials Using Multidimensional Data Mining

收藏
NIAID Data Ecosystem2026-03-10 收录
下载链接:
https://figshare.com/articles/dataset/Generation_of_Pairwise_Potentials_Using_Multidimensional_Data_Mining/7128122
下载链接
链接失效反馈
官方服务:
资源简介:
The rapid development of molecular structural databases provides the chemistry community access to an enormous array of experimental data that can be used to build and validate computational models. Using radial distribution functions collected from experimentally available X-ray and NMR structures, a number of so-called statistical potentials have been developed over the years using the structural data mining strategy. These potentials have been developed within the context of the two-particle Kirkwood equation by extending its original use for isotropic monatomic systems to anisotropic biomolecular systems. However, the accuracy and the unclear physical meaning of statistical potentials have long formed the central arguments against such methods. In this work, we present a new approach to generate molecular energy functions using structural data mining. Instead of employing the Kirkwood equation and introducing the “reference state” approximation, we model the multidimensional probability distributions of the molecular system using graphical models and generate the target pairwise Boltzmann probabilities using the Bayesian field theory. Different from the current statistical potentials that mimic the “knowledge-based” PMF based on the 2-particle Kirkwood equation, the graphical-model-based structure-derived potential developed in this study focuses on the generation of lower-dimensional Boltzmann distributions of atoms through reduction of dimensionality. We have named this new scoring function GARF, and in this work we focus on the mathematical derivation of our novel approach followed by validation studies on its ability to predict protein–ligand interactions.

分子结构数据库的迅猛发展,为化学领域科研群体提供了海量实验数据,可用于构建并验证计算模型。多年来,研究人员依托从实验获取的X射线与核磁共振(NMR)结构中采集的径向分布函数(radial distribution functions),通过结构数据挖掘策略开发出了诸多所谓的统计势能。此类势能的开发均基于双粒子柯尔伍德(Kirkwood)方程,即将其原本用于各向同性单原子系统的应用场景拓展至各向异性生物分子系统。然而,统计势能的准确性不足与物理意义模糊,长期以来都是这类方法遭受质疑的核心争议点。本研究提出了一种基于结构数据挖掘的全新方法,用于生成分子能量函数。该方法并未采用柯尔伍德方程与"参考态近似"手段,而是通过图模型(graphical models)对分子系统的多维概率分布进行建模,并借助贝叶斯场理论(Bayesian field theory)生成目标成对玻尔兹曼概率。与当前基于双粒子柯尔伍德方程、模仿"基于知识"的平均力势(PMF, Potential of Mean Force)的现有统计势能不同,本研究开发的基于图模型的结构衍生势能,聚焦于通过降维手段生成原子的低维玻尔兹曼分布。我们将这一全新评分函数命名为GARF,本研究首先阐述该新方法的数学推导过程,随后通过验证实验评估其预测蛋白质-配体相互作用的能力。
创建时间:
2018-09-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作