Multioutput Perturbation-Theory Machine Learning (PTML) Model of ChEMBL Data for Antiretroviral Compounds
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://figshare.com/articles/dataset/Multioutput_Perturbation-Theory_Machine_Learning_PTML_Model_of_ChEMBL_Data_for_Antiretroviral_Compounds/9750755
下载链接
链接失效反馈官方服务:
资源简介:
Retroviral infections, such as HIV,
are, until now, diseases with
no cure. Medicine and pharmaceutical chemistry need and consider it
a huge goal to define target proteins of new antiretroviral compounds.
ChEMBL manages Big Data features with a complex data set, which is
hard to organize. This makes information difficult to analyze due
to a big number of characteristics described in order to predict new
drug candidates for retroviral infections. For this reason, we propose
to develop a new predictive model combining perturbation theory (PT)
bases and machine learning (ML) modeling to create a new tool that
can take advantage of all the available information. The PTML model
proposed in this work for the ChEMBL data set preclinical experimental
assays for antiretroviral compounds consists of a linear equation
with four variables. The PT operators used are founded on multicondition
moving averages, combining different features and simplifying the
difficulty to manage all data. More than 140 000 preclinical
assays for 56 105 compounds with different characteristics
or experimental conditions have been carried out and can be found
in ChEMBL database, covering combinations with 359 biological activity
parameters (c0), 55 protein accessions
(c1), 83 cell lines (c2), 64 organisms of assay (c3), and 773 subtypes or strains. We have included 150 148 preclinical
experimental assays for HIV virus, 1188 for HTLV virus, 84 for simian
immunodeficiency virus, 370 for murine leukemia virus, 119 for Rous
sarcoma virus, 1581 for MMTV, etc. We also included 5277 assays for
hepatitis B virus. The developed PTML model reached considerable values
in sensibility (73.05% for training and 73.10% for validation), specificity
(86.61% for training and 87.17% for validation), and accuracy (75.84%
for training and 75.98% for validation). We also compared alternative
PTML models with different PT operators such as covariance, moments,
and exponential terms. Finally, we made a comparison between literature
ML models with our PTML model and also artificial neural network (ANN)
nonlinear models. We conclude that this PTML model is the first one
to consider multiple characteristics of preclinical experimental antiretroviral
assays combined, generating a simple, useful, and adaptable instrument,
which could reduce time and costs in antiretroviral drugs research.
创建时间:
2019-08-19



