Improvement of Diffusion Coefficient Prediction by Active Learning
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Improvement_of_Diffusion_Coefficient_Prediction_by_Active_Learning/30013258
下载链接
链接失效反馈官方服务:
资源简介:
Methods for predicting diffusion coefficients in mixtures
are essential
in many applications, as experimental data are scarce. Machine learning
(ML) methods offer promising alternatives to established semiempirical
models for predicting diffusion coefficients, but their performance
strongly depends on the available training data. Increasing the size
of data sets is a straightforward strategy for improving ML methods,
but measuring diffusion coefficients is costly, limiting the number
of experiments that can be carried out. We have therefore studied
active learning (AL) strategies for planning diffusion coefficient
measurements and the targeted improvement of ML methods for their
prediction, specifically matrix completion methods (MCMs) for predicting
diffusion coefficients at infinite dilution Dij∞ in binary mixtures at 298 K. In the first step, different AL strategies
were systematically tested on a synthetic data set for Dij∞, and uncertainty sampling was found to be a simple
but effective choice. This strategy was therefore used for planning Dij∞ measurements using pulsed-field gradient
(PFG) nuclear magnetic resonance (NMR) spectroscopy. In total, Dij∞ in 19 mixtures were measured for which
previously no data were available, and the data were used for retraining
two hybrid MCMs. The results show that significant improvement in
the prediction of Dij∞ can be achieved with only
a few suitably planned experiments, but also that the impact strongly
depends on the used prediction model: while no clear influence on
the performance of an MCM that was trained on the residuals of the
semiempirical SEGWE model was found, the accuracy of a hybrid MCM
that incorporates SEGWE predictions as soft prior information could
be substantially increased, almost halving the relative mean squared
error on the test set.
创建时间:
2025-08-29



