Machine Learning-Assisted QSAR Models on Contaminant Reactivity Toward Four Oxidants: Combining Small Data Sets and Knowledge Transfer

Name: Machine Learning-Assisted QSAR Models on Contaminant Reactivity Toward Four Oxidants: Combining Small Data Sets and Knowledge Transfer
Creator: acs.figshare.com
Published: 2023-06-01 00:00:00
License: 暂无描述

acs.figshare.com2023-06-01 更新2025-03-25 收录

下载链接：

https://acs.figshare.com/articles/dataset/Machine_Learning-Assisted_QSAR_Models_on_Contaminant_Reactivity_Toward_Four_Oxidants_Combining_Small_Data_Sets_and_Knowledge_Transfer/17207173/1

下载链接

链接失效反馈

官方服务：

资源简介：

To develop predictive models for the reactivity of organic contaminants toward four oxidantsSO4•–, HClO, O3, and ClO2all with small sample sizes, we proposed two approaches: combining small data sets and transferring knowledge between them. We first merged these data sets and developed a unified model using machine learning (ML), which showed better predictive performance than the individual models for HClO (RMSEtest: 2.1 to 2.04), O3 (2.06 to 1.94), ClO2 (1.77 to 1.49), and SO4•– (0.75 to 0.70) because the model “corrected” the wrongly learned effects of several atom groups. We further developed knowledge transfer models for three pairs of the data sets and observed different predictive performances: improved for O3 (RMSEtest: 2.06 to 2.01)/HClO (2.10 to 1.98), mixed for O3 (2.06 to 2.01)/ClO2 (1.77 to 1.95), and unchanged for ClO2 (1.77 to 1.77)/HClO (2.1 to 2.1). The effectiveness of the latter approach depended on whether there was consistent knowledge shared between the data sets and on the performance of the individual models. We also compared our approaches with multitask learning and image-based transfer learning and found that our approaches consistently improved the predictive performance for all data sets while the other two did not. This study demonstrated the effectiveness of combining small, similar data sets and transferring knowledge between them to improve ML model performance.

为开发针对四种氧化剂（SO4•–、HClO、O3、ClO2）与有机污染物反应活性的预测模型，且这些氧化剂的数据样本量均较小，本研究提出了两种方法：合并小数据集以及在其间进行知识迁移。首先，我们将这些数据集进行整合，并利用机器学习（ML）技术构建了统一模型，相较于单独的模型，该模型在HClO（RMSEtest：2.1至2.04）、O3（2.06至1.94）、ClO2（1.77至1.49）和SO4•–（0.75至0.70）的预测性能上均有所提升，原因在于模型能够“纠正”部分原子团错误学习的效果。进一步地，我们针对三个数据集对进行了知识迁移模型的开发，并观察到不同的预测性能：O3（RMSEtest：2.06至2.01）/HClO（2.10至1.98）得到改善，O3（2.06至2.01）/ClO2（1.77至1.95）表现混合，而ClO2（1.77至1.77）/HClO（2.1至2.1）则保持不变。该方法的有效性取决于数据集之间共享知识的连贯性以及单独模型的性能。此外，我们还对比了我们的方法与多任务学习和基于图像的迁移学习，发现我们的方法在所有数据集上均能持续提升预测性能，而其他两种方法则未能实现。本研究验证了将小型、相似数据集进行合并，并在其间进行知识迁移以提升机器学习模型性能的有效性。

提供机构：

acs.figshare.com

5,000+

优质数据集

54 个

任务类型

进入经典数据集