five

QURES, a dataset of the properties of pharmacologically active molecules calculated at different levels of quantum chemical theory

收藏
DataCite Commons2025-11-05 更新2026-05-05 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=a7f4c9b4df0e41e2b553655e5adf5ca8
下载链接
链接失效反馈
官方服务:
资源简介:
We present QURES, a dataset containing the results of quantum chemical calculations for 9916 pharmacologically active organic molecules, including approved and experimental drugs. These molecules were derived from the structures of pharmaceuticals reported in DrugBank and PubChem by selecting and neutralizing their largest fragments, which were then filtered to contain no more than 35 carbon atoms, no more than 50 heavy atoms in total, have no charge or unpaired electrons, exclude duplicates and enantiomers, etc. The starting structure for quantum chemistry was the lowest energy conformer chosen by the iMTD-GC algorithm in the CREST program. CREST calculations of conformational ensembles were successfully performed for 7192 molecules (from part1_search_conf subset), while for the remaining molecules, the structure of the lowest energy conformer and the properties of the conformational ensemble were taken from the previously published GEOM dataset (part2_geom_conf and part3_geom_enant_conf subsets). Subsequently, quantum chemical calculations were performed at three different levels of theory: the semiempirical tight-binding GFN2-xTB method (optimization and calculation of vibrational frequencies and thermochemical properties); the r2SCAN-3c composite DFT method (further optimization and calculation of vibrational frequencies and thermochemical properties); and the range-separated ωB97X-D4 hybrid functional with the def2-TZVP basis set (single-point energy calculations). The results obtained for a large number of carefully prepared, medicinally relevant molecules can be used to provide insights into their structure-property relationships and the mechanisms of in vivo interactions. The availability of data at different theory levels allows for the training of machine learning algorithms improving the accuracy of computationally inexpensive approaches to that of higher-tier, much more time-consuming methods.The molecular properties calculated by quantum chemical methods are compiled in the props.csv file (10.73 MB). The properties of the conformer-rotamer ensembles produced by CREST are provided in the ensemble_props.csv file (1.64 MB). The CREST output files are assembled in the crest.7z archive (2.19 GB). The correspondence between the initial structures of pharmaceuticals and the processed standardized molecules (including those filtered out or failed in the calculations) is provided in the drugs.csv file (4.33 MB). An explanation of the data fields in the .csv files is given in the annotation.pdf file (113.65 kB). Optimized geometries and additional calculated properties that could be useful for machine learning tasks are available in the QURES1-main.7z (geometries, xtb output, ORCA property reports, 736.26 MB) and QURES2-full_outfiles.7z (full ORCA output files, 1.52 GB) archives. The QURES3-wfns.7z file (143.89 GB) contains wavefunctions, electron densities, and xtb electrostatic potentials. All other files produced during calculations including the outputs of the r2SCAN-3c calculations that resulted in imaginary frequencies are collected in the QURES4-other.7z file (34.12 GB). The input files for the conformer-rotamer ensemble and quantum chemical calculations are provided in the input.zip archive (21.27 MB).
提供机构:
Science Data Bank
创建时间:
2025-11-05
二维码
社区交流群
二维码
科研交流群
商业服务