rMD17-aq dataset
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10048643
下载链接
链接失效反馈官方服务:
资源简介:
The rMD17-aq dataset:
Citation:
Jas Kalayan, Ismaeel Ramzan, Christopher D. WIlliams, Neil A. Burton and Richard A. Bryce "A neural network potential based on pairwise resolved atomic forces and energies", publication TBC
Description:
QM/MM aqueous simulations of the 10 molecules from the original MD17 dataset by Chmiela et al. (and revised dataset by Christensen et al.) were performed surrounded by 400 SPC/E water molecules. Each simulation was performed for 100~ps at 500K temperature and 1 atm pressure. The solute conformations sampled from the QM/MM simulations performed with CP2K are used to recalculate forces and energies of each conformation in Gaussian with a denser integral grid to effectively remove numerical noise.
We also include an 11th molecule of a higher energy conformer of salicylic acid (directory name: salicylic_high_energy_conformer) in addition to the lower energy conformer sampled in the MD17 dataset.
For each molecule (excluding all surrounding water molecules), this dataset contains the nuclear charges, coordinates (Angstrom), forces (kcal/mol/Ang), energies (kcal/mol/Ang) and partial atomic charges (atomic units) in space separated formats outputted from the numpy savetxt function.
The data:
The files in each molecule directory are:
'nuclear_charges.txt' : The nuclear charges for each atom in a molecule.
'coords.txt' : The Cartesian coordinates for each atom in a conformation (Angstrom units)
'energies.txt' : The total energy of each conformation (kcal/mol units)
'forces.txt' : The Cartesian forces for each atom in a conformation (kcal/mol/Angstrom units)
'charges.txt' : The partial ElectroStatic Potential (ESP) atomic charges (atomic units)
'molecules.prmtop' : The Amber formatted topology file containing the MM parameters for water molecules (solute MM parameters are not used)
'minimised.rst.pdb' : The initial coordinates of a minimised system used to perform QM/MM simulations in CP2K
The input data:
The input files to perform simulations and single point energy calculations are provided in the '_cp2k_gaussian_example_inputs' directory. These files are:
'cp2k-qmmm-example.inp' : input file for the QM/MM simulations performed with CP2K. The number of QM atom kinds are replaced with placeholders CCC, OOO, HHH, NNN for the number of carbon, oxygen, hydrogen and nitrogen atoms respectively in a solute molecule. The system dimensions placeholder XXYYZZ can be replaced with the BOX_DIMENSIONS in the molecules.prmtop file.
'def2-svp.1.cp2k' : the basis set used in QM/MM simulations
'gaussain_input.com': an example of a Gaussian input file for single point energy calculations for aspirin.
rMD17-aq 数据集:
引用文献:
Jas Kalayan、Ismaeel Ramzan、Christopher D. Williams、Neil A. Burton 与 Richard A. Bryce 所著《基于成对解析原子受力与能量的神经网络势能》,正式出版物信息待公布(publication TBC)
数据集说明:
本数据集针对原MD17数据集(由Chmiela等人提出,后经Christensen等人修订)中的10个分子,开展了量子力学/分子力学(QM/MM, Quantum Mechanics/Molecular Mechanics)水溶液模拟:体系由400个SPC/E水分子包裹溶质分子。所有模拟均在500K温度、1 atm压强下运行100皮秒。通过CP2K完成QM/MM模拟所采样得到的溶质构象,将使用Gaussian软件以更致密的积分网格重新计算各构象的受力与能量,以有效消除数值噪声。
此外,本数据集除包含MD17原数据集中采样得到的低能构象外,还额外加入了第11个分子:水杨酸的高能构象(目录名:salicylic_high_energy_conformer)。
针对每个分子(不包含周围水分子),本数据集以numpy.savetxt函数输出的空格分隔格式文件,存储了各原子的核电荷、坐标(单位:埃)、受力(单位:kcal/mol/埃)、能量(单位:kcal/mol)与部分原子电荷(原子单位制)。
数据集文件说明:
每个分子目录下的文件包括:
"nuclear_charges.txt":分子中各原子的核电荷数。
"coords.txt":单个构象中各原子的笛卡尔坐标(单位:埃)。
"energies.txt":单个构象的总能量(单位:kcal/mol)。
"forces.txt":单个构象中各原子的笛卡尔受力(单位:kcal/mol/埃)。
"charges.txt":静电势(ESP, ElectroStatic Potential)拟合得到的部分原子电荷(原子单位制)。
"molecules.prmtop":Amber格式拓扑文件,包含水分子的分子力学参数(溶质的分子力学参数未被使用)。
"minimised.rst.pdb":用于在CP2K中开展QM/MM模拟的最小化体系初始坐标文件。
输入数据说明:
用于开展模拟与单点能计算的输入文件存于"_cp2k_gaussian_example_inputs"目录中,具体包括:
"cp2k-qmmm-example.inp":使用CP2K开展QM/MM模拟的输入文件。文件中用占位符CCC、OOO、HHH、NNN分别替换溶质分子中的碳原子、氧原子、氢原子与氮原子的数量;系统尺寸占位符XXYYZZ可替换为molecules.prmtop文件中的BOX_DIMENSIONS。
"def2-svp.1.cp2k":QM/MM模拟中使用的基组文件。
"gaussian_input.com":针对阿司匹林开展单点能计算的Gaussian输入文件示例。
创建时间:
2023-10-28



