Machine Learning-Assisted QSAR Models on Contaminant Reactivity Toward Four Oxidants: Combining Small Data Sets and Knowledge Transfer
收藏acs.figshare.com2023-06-01 更新2025-03-25 收录
下载链接:
https://acs.figshare.com/articles/dataset/Machine_Learning-Assisted_QSAR_Models_on_Contaminant_Reactivity_Toward_Four_Oxidants_Combining_Small_Data_Sets_and_Knowledge_Transfer/17207173/1
下载链接
链接失效反馈官方服务:
资源简介:
To develop predictive models for
the reactivity of organic contaminants
toward four oxidantsSO4•–, HClO, O3, and ClO2all with small
sample sizes, we proposed two approaches: combining small data sets
and transferring knowledge between them. We first merged these data
sets and developed a unified model using machine learning (ML), which
showed better predictive performance than the individual models for
HClO (RMSEtest: 2.1 to 2.04), O3 (2.06 to 1.94),
ClO2 (1.77 to 1.49), and SO4•– (0.75 to 0.70) because the model “corrected” the wrongly
learned effects of several atom groups. We further developed knowledge
transfer models for three pairs of the data sets and observed different
predictive performances: improved for O3 (RMSEtest: 2.06 to 2.01)/HClO (2.10 to 1.98), mixed for O3 (2.06
to 2.01)/ClO2 (1.77 to 1.95), and unchanged for ClO2 (1.77 to 1.77)/HClO (2.1 to 2.1). The effectiveness of the
latter approach depended on whether there was consistent knowledge
shared between the data sets and on the performance of the individual
models. We also compared our approaches with multitask learning and
image-based transfer learning and found that our approaches consistently
improved the predictive performance for all data sets while the other
two did not. This study demonstrated the effectiveness of combining
small, similar data sets and transferring knowledge between them to
improve ML model performance.
为开发针对四种氧化剂(SO4•–、HClO、O3、ClO2)与有机污染物反应活性的预测模型,且这些氧化剂的数据样本量均较小,本研究提出了两种方法:合并小数据集以及在其间进行知识迁移。首先,我们将这些数据集进行整合,并利用机器学习(ML)技术构建了统一模型,相较于单独的模型,该模型在HClO(RMSEtest:2.1至2.04)、O3(2.06至1.94)、ClO2(1.77至1.49)和SO4•–(0.75至0.70)的预测性能上均有所提升,原因在于模型能够“纠正”部分原子团错误学习的效果。进一步地,我们针对三个数据集对进行了知识迁移模型的开发,并观察到不同的预测性能:O3(RMSEtest:2.06至2.01)/HClO(2.10至1.98)得到改善,O3(2.06至2.01)/ClO2(1.77至1.95)表现混合,而ClO2(1.77至1.77)/HClO(2.1至2.1)则保持不变。该方法的有效性取决于数据集之间共享知识的连贯性以及单独模型的性能。此外,我们还对比了我们的方法与多任务学习和基于图像的迁移学习,发现我们的方法在所有数据集上均能持续提升预测性能,而其他两种方法则未能实现。本研究验证了将小型、相似数据集进行合并,并在其间进行知识迁移以提升机器学习模型性能的有效性。
提供机构:
acs.figshare.com



