Interpretable Machine Learning Models for Phase Prediction in Polymerization-Induced Self-Assembly
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://figshare.com/articles/dataset/Interpretable_Machine_Learning_Models_for_Phase_Prediction_in_Polymerization-Induced_Self-Assembly/22978538
下载链接
链接失效反馈官方服务:
资源简介:
While polymerization-induced self-assembly (PISA) has
become a
preferred synthetic route toward amphiphilic block copolymer self-assemblies,
predicting their phase behavior from experimental design is extremely
challenging, requiring time and work-intensive creation of empirical
phase diagrams whenever self-assemblies of novel monomer pairs are
sought for specific applications. To alleviate this burden, we develop
here the first framework for a data-driven methodology for the probabilistic
modeling of PISA morphologies based on a selection and suitable adaption
of statistical machine learning methods. As the complexity of PISA
precludes generating large volumes of training data with in
silico simulations, we focus on interpretable low variance
methods that can be interrogated for conformity with chemical intuition
and that promise to work well with only 592 training data points which
we curated from the PISA literature. We found that among the evaluated
linear models, generalized additive models, and rule and tree ensembles,
all but the linear models show a decent interpolation performance
with around 0.2 estimated error rate and 1 bit expected cross entropy
loss (surprisal) when predicting the mixture of morphologies formed
from monomer pairs already encountered in the training data. When
considering extrapolation to new monomer combinations, the model performance
is weaker but the best model (random forest) still achieves highly
nontrivial prediction performance (0.27 error rate, 1.6 bit surprisal),
which renders it a good candidate to support the creation of empirical
phase diagrams for new monomers and conditions. Indeed, we find in
three case studies that, when used to actively learn phase diagrams,
the model is able to select a smart set of experiments that lead to
satisfactory phase diagrams after observing only relatively few data
points (5–16) for the targeted conditions. The data set as
well as all model training and evaluation codes are publicly available
through the GitHub repository of the last author.
创建时间:
2023-05-19



