Classification of HIV‑1 Protease Inhibitors by Machine Learning Methods
收藏NIAID Data Ecosystem2026-03-10 收录
下载链接:
https://figshare.com/articles/dataset/Classification_of_HIV_1_Protease_Inhibitors_by_Machine_Learning_Methods/7370153
下载链接
链接失效反馈官方服务:
资源简介:
HIV-1 protease plays
an important role in the processing of virus
infection. Protease is an effective therapeutic target for the treatment
of HIV-1. Our data set is based on a selection of 4855 HIV-1 protease
inhibitors (PIs) from ChEMBL. A series of 15 classification models
for predicting the active inhibitors were built by machine learning
methods, including k-nearest neighors (K-NN), decision
tree (DT), random forest (RF), support vector machine (SVM), and deep
neural network (DNN). The molecular structures were characterized
by (1) fingerprint descriptors including MACCS fingerprints and PubChem
fingerprints and (2) physicochemical descriptors calculated by CORINA
Symphony. The prediction accuracies of all of the models are more
than 70% on the test set; the best accuracy of 83.07% was obtained
by model 4A, which was built by the SVM method based on MACCS fingerprint
descriptors. Nine consensus models were built with three kinds of
different descriptors, which combined all of the machine learning
methods using the “consensus prediction”. Model C3a developed with MACCS fingerprint descriptors showed the highest
accuracy on both training set (91.96%) and test set (83.15%). An external
validation set including 35 989 compounds from DUD database and 239
active inhibitors from the recent literature was used to verify the
performance of our model. The best prediction accuracy of 98.37% was
obtained by model 3C, which was built by RF based on CORINA Symphony
descriptors. In addition, from the analysis of molecular descriptors,
it shows that the aromatic system and atoms related to hydrogen bonding
provide important contributions to the bioactivity of PIs.
创建时间:
2018-11-21



