Do Not Hesitate to Use Tverskyand Other Hints for Successful Active Analogue Searches with Feature Count Descriptors

NIAID Data Ecosystem2026-03-09 收录

下载链接：

https://figshare.com/articles/dataset/Do_Not_Hesitate_to_Use_Tversky_and_Other_Hints_for_Successful_Active_Analogue_Searches_with_Feature_Count_Descriptors/2394139

下载链接

链接失效反馈

官方服务：

资源简介：

This study is an exhaustive analysis of the neighborhood behavior over a large coherent data set (ChEMBL target/ligand pairs of known Ki, for 165 targets with >50 associated ligands each). It focuses on similarity-based virtual screening (SVS) success defined by the ascertained optimality index. This is a weighted compromise between purity and retrieval rate of active hits in the neighborhood of an active query. One key issue addressed here is the impact of Tversky asymmetric weighing of query vs candidate features (represented as integer-value ISIDA colored fragment/pharmacophore triplet count descriptor vectors). The nearly a 3/4 million independent SVS runs showed that Tversky scores with a strong bias in favor of query-specific features are, by far, the most successful and the least failure-prone out of a set of nine other dissimilarity scores. These include classical Tanimoto, which failed to defend its privileged status in practical SVS applications. Tversky performance is not significantly conditioned by tuning of its bias parameter α. Both initial “guesses” of α = 0.9 and 0.7 were more successful than Tanimoto (at its turn, better than Euclid). Tversky was eventually tested in exhaustive similarity searching within the library of 1.6 M commercial + bioactive molecules at http://infochim.u-strasbg.fr/webserv/VSEngine.html, comparing favorably to Tanimoto in terms of “scaffold hopping” propensity. Therefore, it should be used at least as often as, perhaps in parallel to Tanimoto in SVS. Analysis with respect to query subclasses highlighted relationships of query complexity (simply expressed in terms of pharmacophore pattern counts) and/or target nature vs SVS success likelihood. SVS using more complex queries are more robust with respect to the choice of their operational premises (descriptors, metric). Yet, they are best handled by “pro-query” Tversky scores at α > 0.5. Among simpler queries, one may distinguish between “growable” (allowing for active analogs with additional features), and a few “conservative” queries not allowing any growth. These (typically bioactive amine transporter ligands) form the specific application domain of “pro-candidate” biased Tversky scores at α < 0.5.

本研究针对大型关联数据集开展了详尽的邻域行为分析，该数据集包含165个靶点（每个靶点对应超过50个相关配体）的已知Ki值ChEMBL靶点-配体对。研究聚焦于基于相似性的虚拟筛选（similarity-based virtual screening, SVS）成功率，该成功率由经证实的最优指数定义，即活性查询邻域内活性命中物的准确率与召回率之间的加权折中方案。本研究探讨的核心问题之一为：特沃斯基（Tversky）对查询与候选特征的不对称权重设置（以整数型ISIDA着色片段/药效团三联体计数描述符向量表征特征）所带来的影响。近75万次独立的SVS运行结果表明，在其余9种相异性评分集合中，显著偏向查询特异性特征的特沃斯基评分无疑是表现最优且最不易出现失效情况的方法，其中包括经典塔尼莫特系数（Tanimoto）——后者在实际SVS应用中未能维持其固有优势地位。特沃斯基评分的性能受其偏置参数α的调整影响并不显著：α初始预设值为0.9与0.7时，其表现均优于塔尼莫特系数，而塔尼莫特系数又优于欧几里得距离（Euclid）。随后，本研究在http://infochim.u-strasbg.fr/webserv/VSEngine.html提供的包含160万个商业可用与生物活性分子的分子库中开展了全范围相似性检索测试，特沃斯基评分在“骨架跃迁（scaffold hopping）”倾向方面相较于塔尼莫特系数表现更优。因此，该评分至少应与塔尼莫特系数同等频次地应用，甚至可在SVS中与后者并行使用。针对查询子类的分析揭示了查询复杂度（以药效团模式计数进行简化表征）、靶点属性与SVS成功概率之间的关联：使用更复杂查询的SVS在操作前提（描述符、度量标准）的选择上具备更强的鲁棒性，此类场景下最优选择为α>0.5的“偏向查询”特沃斯基评分。在较为简单的查询中，可进一步区分“可拓展型”（允许带有额外特征的活性类似物）与少量“保守型”查询（不允许任何拓展）。后者（通常为生物活性胺转运蛋白配体）是α<0.5的“偏向候选”特沃斯基评分的特定应用场景。

创建时间：

2016-02-19

5,000+

优质数据集

54 个

任务类型

进入经典数据集