Combining Machine Learning and Molecular Dynamics to Predict P‑Glycoprotein Substrates
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://figshare.com/articles/dataset/Combining_Machine_Learning_and_Molecular_Dynamics_to_Predict_P_Glycoprotein_Substrates/12841135
下载链接
链接失效反馈官方服务:
资源简介:
The efflux transporter
P-glycoprotein (P-gp) is responsible for
the extrusion of a wide variety of molecules, including drug molecules,
from the cell. Therefore, P-gp-mediated efflux transport limits the
bioavailability of drugs. To identify potential P-gp substrates early
in the drug discovery process, in silico models have
been developed based on structural and physicochemical descriptors.
In this study, we investigate the use of molecular dynamics fingerprints
(MDFPs) as an orthogonal descriptor for the training of machine learning
(ML) models to classify small molecules into substrates and nonsubstrates
of P-gp. MDFPs encode the information from short MD simulations of
the molecules in different environments (water, membrane, or protein
pocket). The performance of the MDFPs, evaluated on both an in-house
dataset (3930 compounds) and a public dataset from ChEMBL (1114 compounds),
is compared to that of commonly used 2D molecular descriptors, including
structure-based and property-based descriptors. We find that all tested
classifiers interpolate well, achieving high accuracy on chemically
diverse subsets. However, by challenging the models with external
validation and prospective analysis, we show that only tree-based
ML models trained on MDFPs or property-based descriptors generalize
well to regions of the chemical space not covered by the training
set.
创建时间:
2020-08-07



