How Diverse Are Diversity Assessment Methods? A Comparative Analysis and Benchmarking of Molecular Descriptor Space
收藏NIAID Data Ecosystem2026-03-08 收录
下载链接:
https://figshare.com/articles/dataset/How_Diverse_Are_Diversity_Assessment_Methods_A_Comparative_Analysis_and_Benchmarking_of_Molecular_Descriptor_Space/2329309
下载链接
链接失效反馈官方服务:
资源简介:
Chemical
diversity is a widely applied approach to select structurally
diverse subsets of molecules, often with the objective of maximizing
the number of hits in biological screening. While many methods exist
in the area, few systematic comparisons using current descriptors
in particular with the objective of assessing diversity in bioactivity space have been published, and this shortage
is what the current study is aiming to address. In this work, 13 widely
used molecular descriptors were compared, including fingerprint-based
descriptors (ECFP4, FCFP4, MACCS keys), pharmacophore-based descriptors
(TAT, TAD, TGT, TGD, GpiDAPH3), shape-based descriptors (rapid overlay
of chemical structures (ROCS) and principal moments of inertia (PMI)),
a connectivity-matrix-based descriptor (BCUT), physicochemical-property-based
descriptors (prop2D), and a more recently introduced molecular descriptor
type (namely, “Bayes Affinity Fingerprints”). We assessed
both the similar behavior of the descriptors in assessing the diversity
of chemical libraries, and their ability to select compounds from
libraries that are diverse in bioactivity space,
which is a property of much practical relevance in screening library
design. This is particularly evident, given that many future targets
to be screened are not known in advance, but that the library should
still maximize the likelihood of containing bioactive matter also
for future screening campaigns. Overall, our results showed that descriptors
based on atom topology (i.e., fingerprint-based descriptors and pharmacophore-based
descriptors) correlate well in rank-ordering compounds, both within
and between descriptor types. On the other hand, shape-based descriptors
such as ROCS and PMI showed weak correlation with the other descriptors
utilized in this study, demonstrating significantly different behavior.
We then applied eight of the molecular descriptors compared in this
study to sample a diverse subset of sample compounds (4%) from an
initial population of 2587 compounds, covering the 25 largest human
activity classes from ChEMBL and measured the coverage of activity
classes by the subsets. Here, it was found that ”Bayes Affinity
Fingerprints” achieved an average coverage of 92% of activity
classes. Using the descriptors ECFP4, GpiDAPH3, TGT, and random sampling,
91%, 84%, 84%, and 84% of the activity classes were represented in
the selected compounds respectively, followed by BCUT, prop2D, MACCS,
and PMI (in order of decreasing performance). In addition, we were
able to show that there is no visible correlation between compound
diversity in PMI space and in bioactivity space, despite frequent
utilization of PMI plots to this end. To summarize, in this work,
we assessed which descriptors select compounds with high coverage
of bioactivity space, and can hence be used for diverse compound selection
for biological screening. In cases where multiple descriptors are
to be used for diversity selection, this work describes which descriptors
behave complementarily, and can hence be used jointly to focus on
different aspects of diversity in chemical space.
创建时间:
2016-02-18



