Geometries and Dipole Moments calculated by B3LYP/6-31G(d,p) for 10071 Organic Molecular Structures
收藏DataCite Commons2020-08-31 更新2024-07-25 收录
下载链接:
https://figshare.com/articles/Geometries_and_Dipole_Moments_calculated_by_B3LYP_6-31G_d_p_for_10071_Organic_Molecular_Structures/5716246
下载链接
链接失效反馈官方服务:
资源简介:
Geometries and Dipole Moments calculated by B3LYP/6-31G(d,p) for 10071 Organic Molecular Structures.<br><br>Related publication:<br><br>* Florbela Pereira and Joao Aires-de-Sousa:<br><br>Machine Learning for the Prediction of Molecular Dipole Moments Obtained by Density Functional Theory.<br><br>J. Cheminf. (2018)<br>https://doi.org/10.1186/s13321-018-0296-5<br>DOI: 10.1186/s13321-018-0296-5<br><br><br>This data set is publicly available at http://dx.doi.org/10.6084/m9.figshare.5716246<br><br><br><br>Files<br>-----<br><br>dipole_moments_10071mols_sdf.tar.gz - 10071 molecules in the MDL SDFile format including the atomic coordinates of equilibrium geometries calculated by B3LYP/6-31G(d,p).<br><br>dipole_moments_10071mols.xlsx – Dipole moments calculated by B3LYP/6-31G(d,p) for 10071 neutral organic molecules.<br><br><br><br>Molecules<br>---------<br><br>Molecular structures were retrieved from the ZINC database [1], PubChem database [2] and the GDB-13 database [3] of small organic molecules containing up to 7 atoms of C, N, O, F, S, Cl and Br. The structures were standardized with ChemAxon Standardizer (JChem 15.4.6, 2015, ChemAxon Ltd., Budapest, Hungary, http://www.chemaxon.com) and OpenBabel (Open Babel Package, version 2.3.1 http://openbabel.org) for neutralization and inclusion of all hydrogen atoms. Duplicated molecules were discarded, based on canonical SMILES and InChI codes (stereoisomers were considered as duplicated structures). The final database consists of 10,071 molecules with molecular weights (MWs) in the range 40 – 251 g/mol, and containing up to 19 atoms of elements C, N, O, F, S, Cl, Br, and P. The total number of atoms in a molecule (including hydrogen atoms) range from 6 to 43.<br><br>Molecular geometries were first relaxed by the PM7 methods using the MOPAC software [4] and then optimized with the GAMESS program [5] with the B3LYP functional and the 6-31G(d,p) basis set, followed by dipole moment calculation at the same level of theory. <br><br><br> <br>Format<br>------<br><br>Each molecule is stored in its own file, ending in ".sdf". These are the optimized structures by B3LYP/6-31G(d,p). <br><br>The format is the standard MDL SDFile generated with ChemAxon Standardizer and OpenBabel.<br><br>Dipole moments are stored in the dipole_moments_10071mols.xlsx file.<br><br><br> <br>Column Content of .xlsx files<br>------<br><br>1 Molecule ID (as appears in the corresponding .sdf file name)<br><br>2 Dipole moment (in Debye).<br><br> <br><br>References<br>------<br><br>[1] Irwin JJ, Sterling T, Mysinger MM, Bolstad ES, Coleman RG: ZINC: a free tool to discover chemistry for biology. J Chem Inf Model 2012, 52:1757-1768.<br><br>[2] Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, Han L, He J, He S, Shoemaker BA, Wang J, Yu B, Zhang J, Bryant SH: PubChem Substance and Compound databases. Nucleic Acids Res 2016, 44(D1):D1202-13.<br><br>[3] Blum LC, Reymond J-L: 970 Million druglike small molecules for virtual screening in the chemical universe database GDB-13. J Am Chem Soc 2009, 131: 8732-8733.<br><br>[4] MOPAC2012, James J. P. Stewart, Stewart Computational Chemistry, Colorado Springs, CO, USA, http://OpenMOPAC.net (2012).<br><br>[5] Schmidt MW, Baldridge KK, Boatz JA, Elbert ST, Gordon MS, Jensen JJ, Koseki S, Matsunaga N, Nguyen KA, Su S, Windus TL, Dupuis M, Montgomery JA: General atomic and molecular electronic structure system. J Comput Chem 1993, 14:1347-1363. GAMESS Version 1 May 2013 (R1).<br><br><br>
本数据集包含10071个有机分子结构的几何构型与偶极矩,所有数据均采用B3LYP/6-31G(d,p)方法计算得到。<br><br>相关发表文献:<br><br>* Florbela Pereira与Joao Aires-de-Sousa:<br><br>《基于机器学习预测密度泛函理论计算所得分子偶极矩》<br><br>J. Cheminf. (2018)<br>https://doi.org/10.1186/s13321-018-0296-5<br>DOI: 10.1186/s1332-018-0296-5<br><br>本数据集可公开获取于:http://dx.doi.org/10.6084/m9.figshare.5716246<br><br><br>数据集文件<br>-----<br><br>dipole_moments_10071mols_sdf.tar.gz:包含10071个分子的MDL SDFile格式文件,收录了通过B3LYP/6-31G(d,p)方法计算得到的平衡几何构型的原子坐标。<br><br>dipole_moments_10071mols.xlsx:收录10071个中性有机分子通过B3LYP/6-31G(d,p)方法计算得到的偶极矩数据。<br><br><br>分子数据集构建<br>---------<br><br>分子结构检索自ZINC数据库[1]、PubChem数据库[2]以及GDB-13数据库[3],上述数据库收录的小分子有机化合物最多包含C、N、O、F、S、Cl、Br七种原子。采用ChemAxon Standardizer(JChem 15.4.6, 2015, ChemAxon Ltd., 匈牙利布达佩斯, http://www.chemaxon.com)与OpenBabel(Open Babel工具包,版本2.3.1,http://openbabel.org)对结构进行标准化处理,包括电荷中和与添加全部氢原子。基于标准化简化分子线性输入系统(Simplified Molecular-Input Line-Entry System,SMILES)与国际化学标识符(International Chemical Identifier,InChI)代码去重,立体异构体视为重复结构予以剔除。最终数据集包含10071个分子,分子量范围为40~251 g/mol,所含原子种类涵盖C、N、O、F、S、Cl、Br与P,单分子总原子数(含氢原子)介于6~43之间。<br><br>分子几何构型的优化流程为:首先使用MOPAC软件[4]的PM7方法进行初步几何松弛,随后通过GAMESS程序[5]采用B3LYP泛函与6-31G(d,p)基组进行几何构型优化,最后在相同理论级别下完成偶极矩计算。<br><br><br><br>文件格式<br>------<br><br>每个分子单独存储为一个后缀为“.sdf”的文件,对应B3LYP/6-31G(d,p)方法优化后的构型。<br><br>文件格式为通过ChemAxon Standardizer与OpenBabel生成的标准MDL SDFile格式。<br><br>偶极矩数据存储于dipole_moments_10071mols.xlsx文件中。<br><br><br><br>XLSX文件列字段说明<br>------<br><br>1. 分子ID(与对应.sdf文件名完全一致)<br><br>2. 偶极矩(单位:德拜(Debye))。<br><br><br><br>参考文献<br>------<br><br>[1] Irwin JJ, Sterling T, Mysinger MM, Bolstad ES, Coleman RG: ZINC: a free tool to discover chemistry for biology. J Chem Inf Model 2012, 52:1757-1768.<br><br>[2] Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, Han L, He J, He S, Shoemaker BA, Wang J, Yu B, Zhang J, Bryant SH: PubChem Substance and Compound databases. Nucleic Acids Res 2016, 44(D1):D1202-13.<br><br>[3] Blum LC, Reymond J-L: 970 Million druglike small molecules for virtual screening in the chemical universe database GDB-13. J Am Chem Soc 2009, 131: 8732-8733.<br><br>[4] MOPAC2012, James J. P. Stewart, Stewart Computational Chemistry, 美国科罗拉多斯普林斯, http://OpenMOPAC.net (2012).<br><br>[5] Schmidt MW, Baldridge KK, Boatz JA, Elbert ST, Gordon MS, Jensen JJ, Koseki S, Matsunaga N, Nguyen KA, Su S, Windus TL, Dupuis M, Montgomery JA: General atomic and molecular electronic structure system. J Comput Chem 1993, 14:1347-1363. GAMESS Version 1 May 2013 (R1).<br><br>
提供机构:
figshare
创建时间:
2017-12-19



