RotconML: A theoretical dataset for machine learning of spectroscopic parameters
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4064088
下载链接
链接失效反馈官方服务:
资源简介:
This dataset comprises ~83,000 small organic molecules containing [H,C,O,N], with structures and harmonic frequency calculations performed at the ωB97X-D/6-31+G(d) level of theory with Gaussian '16.
The purpose of this dataset is for training machine learning models—in particular, for use in rotational spectroscopy and identifying unknown molecules from spectroscopic parameters. Details of the model and the data can be found in this paper.
This particular combination of electronic structure method and basis set was benchmarked in earlier work to provide relatively low uncertainties in the predicted rotational constants, and through a cancellation of errors, provides equilibrium constants that are extremely close to the vibrationally averaged (experimental) values. More details can be found in this paper.
The dataset is included as a comma-separated value (CSV) file, which can be a little difficult to parse as plain text; I recommend using the `pandas` Python package to parse and manipulate as a Dataframe instead. The columns of this dataset include: rotational constants, moments of inertia and derived values (such as inertial defect and asymmetry parameter), harmonic frequencies and intensities, dipole moments, zero-point energy, the electronic energy, the cartesian coordinates, the SMILES identifier, the final energy difference after optimization, and the molecular mass.
For more details, users are referred to our papers above and/or contact the author. If you are using this dataset for your research/work, please cite this Zenodo entry, and this reference:
McCarthy, M.; Lee, K. L. K. Molecule Identification with Rotational Spectroscopy and Probabilistic Deep Learning. J. Phys. Chem. A 2020, 124 (15), 3002–3017. https://doi.org/10.1021/acs.jpca.0c01376.
创建时间:
2020-10-03



