Using Information from Historical High-Throughput Screens to Predict Active Compounds
收藏NIAID Data Ecosystem2026-03-08 收录
下载链接:
https://figshare.com/articles/dataset/Using_Information_from_Historical_High_Throughput_Screens_to_Predict_Active_Compounds/2270431
下载链接
链接失效反馈官方服务:
资源简介:
Modern
high-throughput screening (HTS) is a well-established approach
for hit finding in drug discovery that is routinely employed in the
pharmaceutical industry to screen more than a million compounds within
a few weeks. However, as the industry shifts to more disease-relevant
but more complex phenotypic screens, the focus has moved to piloting
smaller but smarter chemically/biologically diverse subsets followed
by an expansion around hit compounds. One standard method for doing
this is to train a machine-learning (ML) model with the chemical fingerprints
of the tested subset of molecules and then select the next compounds
based on the predictions of this model. An alternative approach would
be to take advantage of the wealth of bioactivity information contained
in older (full-deck) screens using so-called HTS fingerprints, where
each element of the fingerprint corresponds to the outcome of a particular
assay, as input to machine-learning algorithms. We constructed HTS
fingerprints using two collections of data: 93 in-house assays and
95 publicly available assays from PubChem. For each source, an additional
set of 51 and 46 assays, respectively, was collected for testing.
Three different ML methods, random forest (RF), logistic regression
(LR), and naı̈ve Bayes (NB), were investigated for both
the HTS fingerprint and a chemical fingerprint, Morgan2. RF was found
to be best suited for learning from HTS fingerprints yielding area
under the receiver operating characteristic curve (AUC) values >0.8
for 78% of the internal assays and enrichment factors at 5% (EF(5%))
>10 for 55% of the assays. The RF(HTS-fp) generally outperformed
the
LR trained with Morgan2, which was the best ML method for the chemical
fingerprint, for the majority of assays. In addition, HTS fingerprints
were found to retrieve more diverse chemotypes. Combining the two
models through heterogeneous classifier fusion led to a similar or
better performance than the best individual model for all assays.
Further validation using a pair of in-house assays and data from a
confirmatory screenincluding a prospective set of around 2000
compounds selected based on our approachconfirmed the good
performance. Thus, the combination of machine-learning with HTS fingerprints
and chemical fingerprints utilizes information from both domains and
presents a very promising approach for hit expansion, leading to more
hits. The source code used with the public data is provided.
创建时间:
2016-02-17



