five

OrbNet Denali Training Data

收藏
Figshare2021-07-01 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/OrbNet_Denali_Training_Data/14883867
下载链接
链接失效反馈
官方服务:
资源简介:
OrbNet Denali Training Data This repository contains the data for the paper "OrbNet Denali: A machine learning potential for biological and organic chemistry with semi-empirical cost and DFT accuracy". The data set consists of geometries of molecules and the corresponding energy labels calculated and the DFT and semi-empirical level. Citation Anders S. Christensen(1,a), Sai Krishna Sirumalla(1,a), Zhuoran Qiao(2), Michael B. O'Connor(1), Daniel G. A. Smith(1), Feizhi Ding(1), Peter J. Bygrave(1), Animashree Anandkumar(3,4), Matthew Welborn(1), Frederick R. Manby(1), and Thomas F. Miller III(1,2) "OrbNet Denali: A machine learning potential for biological and organic chemistry with semi-empirical cost and DFT accuracy" (2021) https://arxiv.org/abs/2107.00299 a) Indicates equal contribution Entos, Inc., Los Angeles, CA 90027, USADivision of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA 91125, USADivision of Engineering and Applied Sciences, California Institute of Technology, Pasadena, CA 91125, USANVIDIA, Santa Clara, CA 95051, USA Contents The following files are included: Filename Description MD5checksum denali_labels.tar.gz .csv file with energy labels and other metadata bc9b612f75373d1d191ce7493eebfd62 denali_xyz_files.tar.gz Archive with .xyz geometry files edd35e95a018836d5f174a3431a751df Geometry data The geometries are stored in XYZ+ format, which is compatible with a standard .xyz format, but additionally has the multiplicity and charges annotated in the comment line (2nd) line. The coordinates are in units of Ångstrøm. For example, a water molecule with a charge of 0 and a spin-multiplicity of 1 (i.e. singlet) can be specified in this format as: 3 0 1 O -1.08201 1.07900 -0.02472 H -0.09268 1.08664 0.01745 H -1.37137 1.24781 0.90715 The directory structure of the geometry data contained within denali_xyz_files.tar.gz is as follows: xyz_files/ ├── mol_id1/ │ ├──sample_id0.xyz │ ├──sample_id1.xyz │ ├──sample_id2.xyz │ ├──sample_id3.xyz │ └──sample_id4.xyz ├── mol_id2/ │ ├──sample_id0.xyz │ ├──sample_id1.xyz │ ├──sample_id2.xyz │ └──sample_id3.xyz ├── ... etc Each uniquely identifies a molecule, with the various conformer geometries for that molecule stored in the corresponding folder. Those geometries are in turn identified by a unique identifier. Grouping the geometries by is used in the OrbNet loss-function, see the Eqn. 3 in the paper. Note that not all molecules has multiple geometries. Training labels The training labels (i.e. the wB97X-D3/def2-TZVP and GFN1-xTB energies) and the training and test/validation splits are provided in the file denali_labels.csv in units of Hartree. All molecules are singlet states. The .csv file contains the following columns: Column Description sample_id A unique hash generated from the QM input, also corresponds to the .xyz filename of that geometry subset The data source for that geometry, please refer to the paper for a detailed description of the various subsets mol_id Identifier for the parent molecule test_set True if the geometry is part of the test/validation set of neutral molecules test_set_plus True if the geometry is part of the test/validation set of charged molecules prelim_1 True if the geometry is part of the 10% OrbNet Denali training set training_set_plus True if the geometry is part of the full OrbNet Denali training set charge The charge of the molecule dft_energy wB97X-D3/def2-TZVP energy calculated with Qcore 0.8.17 in Hartree xtb1_energy GFN1-xTB energy calculated with Qcore 0.8.17 in Hartree The .csv file can be loaded in python, for example using Pandas.
创建时间:
2021-07-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作