Synergizing Machine Learning, Conceptual Density Functional Theory, and Biochemistry: No-Code Explainable Predictive Models for Mutagenicity in Aromatic Amines
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Synergizing_Machine_Learning_Conceptual_Density_Functional_Theory_and_Biochemistry_No-Code_Explainable_Predictive_Models_for_Mutagenicity_in_Aromatic_Amines/27652912
下载链接
链接失效反馈官方服务:
资源简介:
This
study synergizes machine learning (ML) with conceptual
density
functional theory (CDFT) to develop OECD-compliant predictive models
for the mutagenic activity of aromatic amines (AAs) with a fully No-Code
methodology using a comprehensive data set of 251 AAs, Leave-One-Out-Cross-Validation
(LOOCV), and three distinct data splits. Our research employs the
GFN2-xTB method, known for its robustness and speed, to compute descriptors
for procarcinogens and their activated metabolites in vacuum and aqueous
phases. We evaluate the effectiveness of different theoretical definitions
of electrophilicity within CDFT, namely, PSL, GCV, and CDP schemes,
and the newly introduced Log QP descriptor to approximate Log P information.
SPAARC, RandomTree, and JCHAID* ML methods were used to build explainable
predictive models with highly robust internal validation (Avg. Correct
Classifications = 76% and Avg. Kappa = 0.29) and external validation
(Avg. Correct Classifications = 79% and Avg. Kappa = 0.33) metrics,
and the results were compared to those of a two hidden layer Multilayer
Perceptron. The results indicate that the second CDP definition for
the electrophilicity in both vacuum and aqueous phases and also the
newly presented Log QP descriptors are the most important ones for
predicting the mutagenic activity of AA (namely ω+VacCDP2+, ω+AqCDP2+, and LogQP1+Vac, respectively). The results indicate that metabolic activation,
aqueous solvent properties, and the CDP electrophilicity schemes and
Log QP should be considered when building predictive models for the
mutagenic activity of AA. This study offers a replicable, No-Code
approach to QSAR research, making high-level ML and CDFT applications
accessible to a broader audience. Future work will expand these methods
to other compound families, enhancing predictive capabilities in the
study of mutagenic activities and other biological phenomena.
创建时间:
2024-11-11



