Can Machines “Learn” Halide Perovskite Crystal Formation without Accurate Physicochemical Features?
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://figshare.com/articles/dataset/Can_Machines_Learn_Halide_Perovskite_Crystal_Formation_without_Accurate_Physicochemical_Features_/12461924
下载链接
链接失效反馈官方服务:
资源简介:
Discovery of new
perovskite materials is motivated by a broad range
of materials applications and accelerated by recent advances in machine
learning (ML). We herein report dataset augmentation, benchmarking,
and interrogation for an ongoing experimental campaign consisting
of 9483 halide perovskite synthesis experiments. To address limitations
in previous work, we developed an improved description of the reactant
concentrations in the experiments (validated against experimental
observations) and performed experiments quantifying the excess volume
of mixing of γ-butyrolactone/formic acid mixtures used in the
perovskite syntheses. Combining this improved description of reactant
concentration with other physicochemical features of the reactants,
we constructed 1108 ML models to elucidate the roles of the algorithm
(k-nearest neighbors, linear support-vector machine,
and gradient boosted tree), feature set (12 in total), preprocessing
regime (e.g., standardization), and training data holdout scheme on
ML predictive ability. ML comparisons illustrated that the chemical
accuracy of less sophisticated physical models in a dataset do not
hinder interpolative model performance. Analysis of feature contributions
showed how ML models “learn” competitive representations
for concentration using raw experimental descriptions. Interrogation
of the most performant models indicated that the numerical values
of physicochemical features were not important, rather these features
were being used to identify and interpolate within a particular reactant
set. ML models were shown to be capable of making rudimentary extrapolations
to untrained chemical systems when compared against basic benchmarks,
and models which included the newly developed chemical features were
shown to be more reliable than models trained without. These results
illustrate how a stepwise comparative approach to machine learning
can provide insight into what and how much models are “learning” for a given prediction task.
创建时间:
2020-05-26



