Molecular geometries and energies from quantum mechanical calculations and small molecule force field evaluations.
收藏Mendeley Data2024-03-27 更新2024-06-28 收录
下载链接:
https://zenodo.org/record/4247859
下载链接
链接失效反馈官方服务:
资源简介:
Force fields are used in a wide variety of contexts for classical molecular simulation, including studies on protein-ligand binding, membrane permeation, and thermophysical property prediction. The quality of these studies relies on the quality of the force fields used to represent the systems. Focusing on small molecules of fewer than 50 heavy atoms, this data compares nine force fields: GAFF, GAFF2, MMFF94, MMFF94S, OPLS3e, SMIRNOFF99Frosst, and the Open Force Field Parsley, versions 1.0, 1.1, and 1.2. On a dataset comprising 22,675 molecular structures of 3,271 molecules, we analyzed force field-optimized geometries and conformer energies compared to reference quantum mechanical (QM) data. The data was created using scripts of the benchmarkff github repository. A corresponding manuscript is submitted, a preprint is available on ChemRxiv: Lim, Victoria T.; Hahn, David F.; Tresadern, Gary; Bayly, Christopher I.; Mobley, David (2020): Benchmark Assessment of Molecular Geometries and Energies from Small Molecule Force Fields. ChemRxiv. Preprint Read below or the file README.md for further information and description of the content: # README
Version: 04 Nov 2020
For Python scripts that are NOT found in these directories, please check the
[BenchmarkFF Github repo](https://github.com/MobleyLab/benchmarkff/tree/master/tools).
## Procedure
1. Prep OPLS3e file for analysis: standardize format by OpenEye in case of differences
and convert from kJ/mol to kcal/mol.
```
cd prep
python convert_extension.py -i opls3e_minimized.sd -o opls3e.sdf
```
2. Remove mols that couldn't parameterize by ALL FFs.
```
python get_by_tag.py -i opls3e.sdf -s "SMILES QCArchive" -list trim3.txt -o trim3_full_opls3e.sdf
```
3. Run analysis.
```
conda activate parsley
# calc ddE, RMSD, and TFD distributions
python compare_ffs.py -i match.in -t 'SMILES QCArchive' --plot > metrics.out
# match_minima, only in 01_analysis_all and 02_analysis_all_smaller_cutoff
python match_minima.py -i match.in --plot --cutoff 1.0 --readpickle
# look at specific subsets, only in 01_analysis_all
python color_by_moiety.py -i match.in -p metrics.pickle -s N-N.dat azetidine.dat octahydrotetracene.dat -o scatter_tfd_3_
# look at outliers,only in 01_analysis_all and 02_analysis_all_smaller_cutoff
python tailed_parameters.py -i refdata_trim_overlap_full_openff_unconstrained-1.2.0.sdf -f <offxml file> --metric 'TFD' --cutoff 0.12 --tag "TFD to trim_overlap_full_qcarchive.sdf" --tag_smiles "SMILES QCArchive" > output_tfd.dat
```
## Brief description of contents
* High level:
```
.
├── 00_prep
│ ├── convert_extension.py
│ ├── opls3e_minimized.sd OPLS3e minimized structures from Schrodinger Maestro
│ ├── opls3e.sdf standardized through OpenEye tools
│ ├── opt_openff*.sdf OpenFF minimized conformations
├── 01_analysis_all compare all ffs (qm, GAFF(2), MMFF94(S), Smirnoff, OpenFF-X.X, OPLS3e)
├── 02_analysis_all_smaller_cutoff compare all ffs (qm, GAFF(2), MMFF94(S), Smirnoff, OpenFF-X.X, OPLS3e) with a smaller cutoff of .3 for match_minima
├── 03_analysis_latest_ffs compare only the latest versions of ffs (qm, GAFF2, MMFF94S, OpenFF-1.2, OPLS3e)
├── 04_analysis_openff_only compare only OpenFF ffs (qm, Smirnoff, OpenFF-X.X)
└── README.md
```
* Inside an output directory:
```
YY_analysis_* various output files of above mentioned scripts, some are listed and described below:
├── bar*.png parameter coverage bar plots
├── ddE.dat relative energies data
├── fig_density_*.png scatter plots of ddE vs (RMSD or TFD) for each force field
├── match.in input file for compare_ffs.py
├── metrics.out output file for compare_ffs.py
├── metrics.pickle pickle file for compare_ffs.py -- you can read this into compare_ffs instead of rerunning the full analysis
├── refdata_*.sdf output SDF files with stored RMSD / TFD scores with reference to QM for each structure
├── relene_*.dat relative energies of matched conformers
├── ridge_dde.png compared energies plot
├── ridge_rmsd.svg compared rmsds plot
├── ridge_tfd.svg compared tfds plot
├── fig_scatter_*.png scatter plots of ddE vs (RMSD or TFD). these are noisy; I don't use these
├── trim3_*.sdf input SDF files for compare_ffs.py listed in match.in file
├── violin*.* violin plot showing ddE distributions
```
力场(force field)在经典分子模拟中应用场景广泛,涵盖蛋白质-配体结合、膜渗透以及热物理性质预测等研究方向。此类研究的质量高度依赖于用于表征体系的力场品质。本数据集聚焦于重原子数少于50的小分子,对比了9种力场:GAFF、GAFF2、MMFF94、MMFF94S、OPLS3e、SMIRNOFF99Frosst,以及开放力场(Open Force Field)Parsley的1.0、1.1和1.2版本。
本数据集包含3271个分子的22675个分子结构,我们针对力场优化后的几何构型与构象能,与参考量子力学(quantum mechanical, QM)数据进行了对比分析。本数据集通过benchmarkff的GitHub仓库脚本生成。相关研究论文已投稿,预印本可在ChemRxiv获取:Lim, Victoria T.; Hahn, David F.; Tresadern, Gary; Bayly, Christopher I.; Mobley, David (2020): Benchmark Assessment of Molecular Geometries and Energies from Small Molecule Force Fields. ChemRxiv. Preprint.
以下为进一步信息与内容说明,可参阅下文或项目文件README.md:
# README
版本:2020年11月4日
若未在当前目录找到Python脚本,请前往[BenchmarkFF GitHub仓库](https://github.com/MobleyLab/benchmarkff/tree/master/tools)获取。
## 操作流程
1. 预处理OPLS3e分析文件:通过OpenEye工具统一格式以消除差异,并将能量单位从kJ/mol转换为kcal/mol。
cd prep
python convert_extension.py -i opls3e_minimized.sd -o opls3e.sdf
2. 移除无法被所有力场参数化的分子。
python get_by_tag.py -i opls3e.sdf -s "SMILES QCArchive" -list trim3.txt -o trim3_full_opls3e.sdf
3. 执行分析。
conda activate parsley
# 计算相对能量差(ddE)、均方根偏差(RMSD)与拓扑指纹偏差(TFD)的分布
python compare_ffs.py -i match.in -t 'SMILES QCArchive' --plot > metrics.out
# 匹配极小值点,仅在01_analysis_all与02_analysis_all_smaller_cutoff目录中可用
python match_minima.py -i match.in --plot --cutoff 1.0 --readpickle
# 针对特定子集进行分析,仅在01_analysis_all目录中可用
python color_by_moiety.py -i match.in -p metrics.pickle -s N-N.dat azetidine.dat octahydrotetracene.dat -o scatter_tfd_3_
# 分析异常值,仅在01_analysis_all与02_analysis_all_smaller_cutoff目录中可用
python tailed_parameters.py -i refdata_trim_overlap_full_openff_unconstrained-1.2.0.sdf -f <offxml文件> --metric 'TFD' --cutoff 0.12 --tag "TFD to trim_overlap_full_qcarchive.sdf" --tag_smiles "SMILES QCArchive" > output_tfd.dat
## 内容简要说明
* 整体目录结构:
.
├── 00_prep
│ ├── convert_extension.py
│ ├── opls3e_minimized.sd # 来自Schrodinger Maestro的OPLS3e最小化结构
│ ├── opls3e.sdf # 通过OpenEye工具标准化后的文件
│ ├── opt_openff*.sdf # OpenFF最小化构象
├── 01_analysis_all # 对比所有力场(量子力学、GAFF(2)、MMFF94(S)、Smirnoff、OpenFF-X.X、OPLS3e)
├── 02_analysis_all_smaller_cutoff # 对比所有力场,且将match_minima的匹配截断阈值设置为更小的0.3
├── 03_analysis_latest_ffs # 仅对比最新版本的力场(量子力学、GAFF2、MMFF94S、OpenFF-1.2、OPLS3e)
├── 04_analysis_openff_only # 仅对比OpenFF系列力场(量子力学、Smirnoff、OpenFF-X.X)
└── README.md
* 输出目录内的文件:
YY_analysis_* # 上述脚本生成的各类输出文件,部分文件说明如下:
├── bar*.png # 力场参数覆盖度柱状图
├── ddE.dat # 相对能量数据文件
├── fig_density_*.png # 各力场的ddE与(RMSD或TFD)的散点密度图
├── match.in # compare_ffs.py的输入文件
├── metrics.out # compare_ffs.py的输出日志文件
├── metrics.pickle # compare_ffs.py的pickle序列化文件,可直接读取以跳过完整重分析流程
├── refdata_*.sdf # 存储各结构与参考QM数据的RMSD/TFD分值的SDF输出文件
├── relene_*.dat # 匹配构象的相对能量数据
├── ridge_dde.png # 相对能量对比图
├── ridge_rmsd.svg # RMSD对比图
├── ridge_tfd.svg # TFD对比图
├── fig_scatter_*.png # ddE与(RMSD或TFD)的散点图,噪声较大,不建议使用
├── trim3_*.sdf # match.in文件中指定的compare_ffs.py输入SDF文件
├── violin*.* # 展示ddE分布的小提琴图
创建时间:
2023-06-28



