five

Improved Chemical Prediction from Scarce Data Sets via Latent Space Enrichment

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://figshare.com/articles/dataset/Improved_Chemical_Prediction_from_Scarce_Data_Sets_via_Latent_Space_Enrichment/8082641
下载链接
链接失效反馈
官方服务:
资源简介:
Modern machine learning provides promising methods for accelerating the discovery and characterization of novel chemical species. However, in many areas experimental data remain costly and scarce, and computational models are unavailable for targeted figures of merit. Here we report a promising pathway to address this challenge by using chemical latent space enrichment, whereby disparate data sources are combined in joint prediction tasks to enable improved prediction in data-scarce applications. The approach is demonstrated for pKa prediction of moderately sized molecular species using a combination of experimentally available pKa data and density functional theory-based characterizations of the (de)­protonation free energy. A novel autoencoder framework is used to create a continuous chemical latent space that is then used in single and joint training tasks for property prediction. By combining these two data sets in a jointly trained autoencoder framework, we observe mutual improvement in property prediction tasks in the scarce data limit. We also demonstrate an enrichment mechanism that is unique to latent space training, whereby training on excess computational data can mitigate the prediction losses associated with scarce experimental data and advantageously organize the latent space. These results demonstrate that disparate chemical data sources can be advantageously combined in an autoencoder framework with potential general application to data-scarce chemical learning tasks.
创建时间:
2019-04-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作