Improved Chemical Prediction from Scarce Data Sets via Latent Space Enrichment
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://figshare.com/articles/dataset/Improved_Chemical_Prediction_from_Scarce_Data_Sets_via_Latent_Space_Enrichment/8082641
下载链接
链接失效反馈官方服务:
资源简介:
Modern
machine learning provides promising methods for accelerating
the discovery and characterization of novel chemical species. However,
in many areas experimental data remain costly and scarce, and computational
models are unavailable for targeted figures of merit. Here we report
a promising pathway to address this challenge by using chemical latent
space enrichment, whereby disparate data sources are combined in joint
prediction tasks to enable improved prediction in data-scarce applications.
The approach is demonstrated for pKa prediction
of moderately sized molecular species using a combination of experimentally
available pKa data and density functional
theory-based characterizations of the (de)protonation free energy.
A novel autoencoder framework is used to create a continuous chemical
latent space that is then used in single and joint training tasks
for property prediction. By combining these two data sets in a jointly
trained autoencoder framework, we observe mutual improvement in property
prediction tasks in the scarce data limit. We also demonstrate an
enrichment mechanism that is unique to latent space training, whereby
training on excess computational data can mitigate the prediction
losses associated with scarce experimental data and advantageously
organize the latent space. These results demonstrate that disparate
chemical data sources can be advantageously combined in an autoencoder
framework with potential general application to data-scarce chemical
learning tasks.
创建时间:
2019-04-29



