five

ΔG-RDKit: Solvation Free Energy Database

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8121560
下载链接
链接失效反馈
官方服务:
资源简介:
We present the full database of the article "Explainable Supervised Machine Learning Model to Predict Solvation Free Energy". This is the database used for a ML model, containing a variety of solvent-solute pairs with known experimental solvation free energy ΔGsolv values. Data entries were collected from two separate databases. The FreeSolv library, with 642 experimental aqueous ΔGsolv determinations and the Solv@TUM database with 5597 entries for non-aqueous solvents. Both databases were selected given their wide-scale of solute/solvents pairs, amassing 6239 experimental values across light and heavy-atom solutes with a diverse solvent structure and with small value uncertainties. Experimental ΔGsolv values range from -14 to 4 kcal mol-1 and each solute/solvent pair is represented by their chemical family, SMILES string and InChlKey. We generated 213 chemical descriptors for every solvent and solute in each entry using RDKit software, version 2022.09.4, running on top of Python 3.9. Descriptors were calculated from the “MolFromSmiles” function in “RDKIT.Chem” as descriptors with non-numerical values were removed. The descriptors encode significant chemical information and are used to present physicochemical characteristics of compounds, building a relationship between structure and ΔGsolv. Through Machine Learning regression algorithms, our models were able to make ΔGsolv predictions with high accuracy, based on the information encoded in each chemical feature.
创建时间:
2023-07-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作