Hybrid pKa Dataset for Acid Molecules
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10551521
下载链接
链接失效反馈官方服务:
资源简介:
This hybrid dataset is composed of a collection of 13460 acid molecules with their corresponding dissociation constant (\(\rm{p}\it{K}_\rm{a}\)) and the temperature in which the dissociation constant was measured.
This dataset was created in two different steps. The first step came from two different sources, one from literature digitization [1], and the second source was a chemical intelligence support software that predicts the \(\rm{p}\it{K}_\rm{a}\) [2].
Among other variables in this first step, both datasets contained information regarding the SMILES representation of the molecules, \(\rm{p}\it{K}_\rm{a}\), and temperature, see the table below. A preprocessing was done to remove duplicated compounds, but only when the temperatures were the same, otherwise, repeated molecules were kept. Finally, the resulting dataset from this first step ended with 6030 entries.
The second step was an extension of the dataset, done in two ways. The first way was using the dataset from [3], and the second way was by the use of the results coming from simulations that we did using a conditional variational autoencoder (CVAE). The CVAE was trained with the dataset from step 1. The final hybrid dataset possesses 13460 molecules.
Parameter
Description
SMILES
String molecular representation of molecules.
\(\rm{p}\it{K}_\rm{a}\)
The negative base-10 logarithm of the acid dissociation constant \(\it{K}_\rm{a}\) of a solution.
Temperature (°C)
The temperature at which \(\rm{p}\it{K}_\rm{a}\) was measured in degrees Celsius.
This hybrid dataset was created to be used within the framework of the MOZART project with Grant Agreement 101058450, website: Mozart-Project.
[1] Jonathan Zheng. (2022). IUPAC/Dissociation-Constants: v1.0 (v1-0_initial-release) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7236453
[2] DataWarrior: An Open-Source Program For Chemistry Aware Data Visualization And Analysis. Thomas Sander, Joel Freyss, Modest von Korff, and Christian Rufener. Journal of Chemical Information and Modeling 2015 55 (2), 460-473. DOI: 10.1021/ci500588j. Website: www.openmolecules.org.
[3] Python script to lookup pKa values. Repository: GitHub - khoivan88/pka_lookup: Python script to lookup pKa values
创建时间:
2024-02-05



