five

SurfPro - A curated database and predictive model of experimental properties of surfactants

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14931936
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset and code used for the publication in Digital Discovery titled'SurfPro - A curated database and predictive model of experimental properties of surfactants'.See also https://github.com/BigChemistry-RobotLab/SurfPro Abstract:Despite great industrial interest, modeling the physical properties of surfactants in water basedon their molecular structure remains a challenge. A significant part of this challenge is in obtainingsufficient amounts of high-quality data. Experimentally determined properties such the critical micelleconcentration (CMC) and surface tension at CMC (γCMC) have been reported for many surfactants.However, surfactant data are scattered across many literature sources, and reported in a mannerwhich is often unsuitable as input for predictive models. In this work, we address this limitation bycompiling the SurfPro database of Surfactant Properties. SurfPro consists of 1624 surfactant entriescurated from 223 literature sources, containing 1395 CMC values, 972 γCMC values and more than657 values for Γmax, C20, πCMC and Amin. However, only 647 structures have all reported properties,and for most surfactants multiple properties are missing. We trained a previously reported graphneural network architecture for single- and multi-property prediction on these incomplete data of allsurfactant types in the database to accurately predict pCMC (− log10(CMC)), γCMC, Γmax and pC20.We achieved state-of-the-art performance of these four properties using an ensemble of AttentiveFPmodels trained on ten different folds of the training data in the multi-property setting. Finally, weleveraged the predictions and uncertainties of the ensemble model to impute all missing propertiesfor all 977 surfactants with an incomplete set of properties. We make our curated SurfPro database,proposed test split and training datasets, the imputed database, as well as our code publicly available.
创建时间:
2025-02-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作