OrbNet Denali Training Data
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://figshare.com/articles/dataset/OrbNet_Denali_Training_Data/14883867
下载链接
链接失效反馈官方服务:
资源简介:
OrbNet Denali Training Data
This repository contains the data for the paper "OrbNet Denali: A machine learning potential for biological and
organic chemistry with semi-empirical cost and DFT accuracy". The data set consists of geometries of molecules
and the corresponding energy labels calculated and the DFT and semi-empirical level.
Citation
Anders S. Christensen(1,a), Sai Krishna Sirumalla(1,a), Zhuoran
Qiao(2), Michael B. O'Connor(1), Daniel G. A. Smith(1), Feizhi Ding(1),
Peter J. Bygrave(1), Animashree Anandkumar(3,4), Matthew Welborn(1),
Frederick R. Manby(1), and Thomas F. Miller III(1,2)
"OrbNet Denali: A machine learning potential for biological and organic
chemistry with semi-empirical cost and DFT accuracy" (2021) https://arxiv.org/abs/2107.00299
a) Indicates equal contribution
Entos, Inc., Los Angeles, CA 90027, USADivision of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA 91125, USADivision of Engineering and Applied Sciences, California Institute of Technology, Pasadena, CA 91125, USANVIDIA, Santa Clara, CA 95051, USA
Contents
The following files are included:
Filename
Description
MD5checksum
denali_labels.tar.gz
.csv file with energy labels and other metadata
bc9b612f75373d1d191ce7493eebfd62
denali_xyz_files.tar.gz
Archive with .xyz geometry files
edd35e95a018836d5f174a3431a751df
Geometry data
The geometries are stored in XYZ+ format, which is compatible with a standard .xyz format, but additionally has the
multiplicity and charges annotated in the comment line (2nd) line. The coordinates are in units of Ångstrøm.
For example, a water molecule with a charge of 0 and a spin-multiplicity of 1 (i.e. singlet) can be specified in this format as:
3
0 1
O -1.08201 1.07900 -0.02472
H -0.09268 1.08664 0.01745
H -1.37137 1.24781 0.90715
The directory structure of the geometry data contained within denali_xyz_files.tar.gz is as follows:
xyz_files/
├── mol_id1/
│ ├──sample_id0.xyz
│ ├──sample_id1.xyz
│ ├──sample_id2.xyz
│ ├──sample_id3.xyz
│ └──sample_id4.xyz
├── mol_id2/
│ ├──sample_id0.xyz
│ ├──sample_id1.xyz
│ ├──sample_id2.xyz
│ └──sample_id3.xyz
├── ... etc
Each uniquely identifies a molecule, with the various conformer geometries for that molecule stored in the corresponding folder.
Those geometries are in turn identified by a unique identifier.
Grouping the geometries by is used in the OrbNet loss-function, see the Eqn. 3 in the paper.
Note that not all molecules has multiple geometries.
Training labels
The training labels (i.e. the wB97X-D3/def2-TZVP and GFN1-xTB
energies) and the training and test/validation splits are provided in
the file denali_labels.csv in units of Hartree. All molecules are singlet states.
The .csv file contains the following columns:
Column
Description
sample_id
A unique hash generated from the QM input, also corresponds to the .xyz filename of that geometry
subset
The data source for that geometry, please refer to the paper for a detailed description of the various subsets
mol_id
Identifier for the parent molecule
test_set
True if the geometry is part of the test/validation set of neutral molecules
test_set_plus
True if the geometry is part of the test/validation set of charged molecules
prelim_1
True if the geometry is part of the 10% OrbNet Denali training set
training_set_plus
True if the geometry is part of the full OrbNet Denali training set
charge
The charge of the molecule
dft_energy
wB97X-D3/def2-TZVP energy calculated with Qcore 0.8.17 in Hartree
xtb1_energy
GFN1-xTB energy calculated with Qcore 0.8.17 in Hartree
The .csv file can be loaded in python, for example using Pandas.
创建时间:
2021-07-01



