PTML Combinatorial Model of ChEMBL Compounds Assays for Multiple Types of Cancer
收藏NIAID Data Ecosystem2026-03-10 收录
下载链接:
https://figshare.com/articles/dataset/PTML_Combinatorial_Model_of_ChEMBL_Compounds_Assays_for_Multiple_Types_of_Cancer/7160288
下载链接
链接失效反馈官方服务:
资源简介:
Determining the target proteins of
new anticancer compounds is
a very important task in Medicinal Chemistry. In this sense, chemists
carry out preclinical assays with a high number of combinations of
experimental conditions (cj). In fact, ChEMBL database contains outcomes of 65 534 different
anticancer activity preclinical assays for 35 565 different
chemical compounds (1.84 assays per compound). These assays cover
different combinations of cj formed from >70 different biological activity parameters (c0), >300 different drug targets (c1), >230 cell lines (c2),
and 5 organisms of assay (c3) or organisms
of the target (c4). It include a total
of 45 833 assays in leukemia, 6227 assays in breast cancer,
2499 assays in ovarian cancer, 3499 in colon cancer, 3159 in lung
cancer, 2750 in prostate cancer, 601 in melanoma, etc. This is a very
complex data set with multiple Big Data features. This data is hard
to be rationalized by researchers to extract useful relationships
and predict new compounds. In this context, we propose to combine
perturbation theory (PT) ideas and machine learning (ML) modeling
to solve this combinatorial-like problem. In this work, we report
a PTML (PT + ML) model for ChEMBL data set of preclinical assays of
anticancer compounds. This is a simple linear model with only three
variables. The model presented values of area under receiver operating
curve = AUROC = 0.872, specificity = Sp(%) = 90.2, sensitivity = Sn(%)
= 70.6, and overall accuracy = Ac(%) = 87.7 in training series. The
model also have Sp(%) = 90.1, Sn(%) = 71.4, and Ac(%) = 87.8 in external
validation series. The model use PT operators based on multicondition
moving averages to capture all the complexity of the data set. We
also compared the model with nonlinear artificial neural network (ANN)
models obtaining similar results. This confirms the hypothesis of
a linear relationship between the PT operators and the classification
as anticancer compounds in different combinations of assay conditions.
Last, we compared the model with other PTML models reported in the
literature concluding that this is the only one PTML model able to
predict activity against multiple types of cancer. This model is a
simple but versatile tool for the prediction of the targets of anticancer
compounds taking into consideration multiple combinations of experimental
conditions in preclinical assays.
创建时间:
2018-10-03



