ΔG-RDKit: Solvation Free Energy Database
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8121560
下载链接
链接失效反馈官方服务:
资源简介:
We present the full database of the article "Explainable Supervised Machine Learning Model to Predict Solvation Free Energy".
This is the database used for a ML model, containing a variety of solvent-solute pairs with known experimental solvation free energy ΔGsolv values. Data entries were collected from two separate databases. The FreeSolv library, with 642 experimental aqueous ΔGsolv determinations and the Solv@TUM database with 5597 entries for non-aqueous solvents. Both databases were selected given their wide-scale of solute/solvents pairs, amassing 6239 experimental values across light and heavy-atom solutes with a diverse solvent structure and with small value uncertainties.
Experimental ΔGsolv values range from -14 to 4 kcal mol-1 and each solute/solvent pair is represented by their chemical family, SMILES string and InChlKey. We generated 213 chemical descriptors for every solvent and solute in each entry using RDKit software, version 2022.09.4, running on top of Python 3.9. Descriptors were calculated from the “MolFromSmiles” function in “RDKIT.Chem” as descriptors with non-numerical values were removed. The descriptors encode significant chemical information and are used to present physicochemical characteristics of compounds, building a relationship between structure and ΔGsolv.
Through Machine Learning regression algorithms, our models were able to make ΔGsolv predictions with high accuracy, based on the information encoded in each chemical feature.
创建时间:
2023-07-07



