five

Improvement of Diffusion Coefficient Prediction by Active Learning

收藏
Figshare2025-08-29 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Improvement_of_Diffusion_Coefficient_Prediction_by_Active_Learning/30013258
下载链接
链接失效反馈
官方服务:
资源简介:
Methods for predicting diffusion coefficients in mixtures are essential in many applications, as experimental data are scarce. Machine learning (ML) methods offer promising alternatives to established semiempirical models for predicting diffusion coefficients, but their performance strongly depends on the available training data. Increasing the size of data sets is a straightforward strategy for improving ML methods, but measuring diffusion coefficients is costly, limiting the number of experiments that can be carried out. We have therefore studied active learning (AL) strategies for planning diffusion coefficient measurements and the targeted improvement of ML methods for their prediction, specifically matrix completion methods (MCMs) for predicting diffusion coefficients at infinite dilution Dij∞ in binary mixtures at 298 K. In the first step, different AL strategies were systematically tested on a synthetic data set for Dij∞, and uncertainty sampling was found to be a simple but effective choice. This strategy was therefore used for planning Dij∞ measurements using pulsed-field gradient (PFG) nuclear magnetic resonance (NMR) spectroscopy. In total, Dij∞ in 19 mixtures were measured for which previously no data were available, and the data were used for retraining two hybrid MCMs. The results show that significant improvement in the prediction of Dij∞ can be achieved with only a few suitably planned experiments, but also that the impact strongly depends on the used prediction model: while no clear influence on the performance of an MCM that was trained on the residuals of the semiempirical SEGWE model was found, the accuracy of a hybrid MCM that incorporates SEGWE predictions as soft prior information could be substantially increased, almost halving the relative mean squared error on the test set.
创建时间:
2025-08-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作