Do Not Hesitate to Use Tverskyand Other Hints for Successful Active Analogue Searches with Feature Count Descriptors
收藏NIAID Data Ecosystem2026-03-09 收录
下载链接:
https://figshare.com/articles/dataset/Do_Not_Hesitate_to_Use_Tversky_and_Other_Hints_for_Successful_Active_Analogue_Searches_with_Feature_Count_Descriptors/2394148
下载链接
链接失效反馈官方服务:
资源简介:
This
study is an exhaustive analysis of the neighborhood behavior
over a large coherent data set (ChEMBL target/ligand pairs of known Ki, for 165 targets with >50 associated ligands
each). It focuses on similarity-based virtual screening (SVS) success
defined by the ascertained optimality index. This is a weighted compromise
between purity and retrieval rate of active hits in the neighborhood
of an active query. One key issue addressed here is the impact of
Tversky asymmetric weighing of query vs candidate features (represented
as integer-value ISIDA colored fragment/pharmacophore triplet count
descriptor vectors). The nearly a 3/4 million independent SVS runs
showed that Tversky scores with a strong bias in favor of query-specific
features are, by far, the most successful and the least failure-prone
out of a set of nine other dissimilarity scores. These include classical
Tanimoto, which failed to defend its privileged status in practical
SVS applications. Tversky performance is not significantly conditioned
by tuning of its bias parameter α. Both initial “guesses”
of α = 0.9 and 0.7 were more successful than Tanimoto (at its
turn, better than Euclid). Tversky was eventually tested in exhaustive
similarity searching within the library of 1.6 M commercial + bioactive
molecules at http://infochim.u-strasbg.fr/webserv/VSEngine.html, comparing favorably to Tanimoto in terms of “scaffold hopping”
propensity. Therefore, it should be used at least as often as, perhaps
in parallel to Tanimoto in SVS. Analysis with respect to query subclasses
highlighted relationships of query complexity (simply expressed in
terms of pharmacophore pattern counts) and/or target nature vs SVS
success likelihood. SVS using more complex queries are more robust
with respect to the choice of their operational premises (descriptors,
metric). Yet, they are best handled by “pro-query” Tversky
scores at α > 0.5. Among simpler queries, one may distinguish
between “growable” (allowing for active analogs with
additional features), and a few “conservative” queries
not allowing any growth. These (typically bioactive amine transporter
ligands) form the specific application domain of “pro-candidate”
biased Tversky scores at α < 0.5.
创建时间:
2016-02-19



