Virtual Screening with Generative Topographic Maps: How Many Maps Are Required?
收藏NIAID Data Ecosystem2026-03-10 收录
下载链接:
https://figshare.com/articles/dataset/Virtual_Screening_with_Generative_Topographic_Maps_How_Many_Maps_Are_Required_/7538054
下载链接
链接失效反馈官方服务:
资源简介:
Universal generative topographic
maps (GTMs) provide two-dimensional
representations of chemical space selected for their “polypharmacological
competence”, that is, the ability to simultaneously represent
meaningful activity and property landscapes, associated with many
distinct targets and properties. Several such GTMs can be generated,
each based on a different initial descriptor vector, encoding distinct
structural features. While their average polypharmacological competence
may indeed be equivalent, they nevertheless significantly diverge
with respect to the quality of each property-specific landscape. In
this work, we show that distinct universal maps represent complementary
and strongly synergistic views of biologically relevant chemical space.
Eight universal GTMs were employed as support for predictive classification
landscapes, using more than 600 active/inactive ligand series associated
with as many targets from the ChEMBL database (v.23). For nine of
these targets, it was possible to extract, from the Directory of Useful
Decoys (DUD), truly external sets featuring sufficient “actives”
and “decoys” not present in the landscape-defining ChEMBL
ligand sets. For each such molecule, projected on every class landscape
of a particular universal map, a probability of activity was estimated,
in analogy to a virtual screening (VS) experiment. Cross-validated
(CV) balanced accuracy on landscape-defining ChEMBL data was unable
to predict the success of that landscape in VS. Thus, the universal
map with best CV results for a given property should not be prioritized
as the implicitly best predictor. For a given map, predictions for
many DUD compounds are not trustworthy, according to applicability
domain considerations. By contrast, simultaneous application of all
universal maps, and rating of the likelihood of activity as the mean
returned by all applicable maps, significantly improved prediction
results. Performance measures in consensus VS using multiple maps
were always superior or similar to those of the best individual map.
通用生成式拓扑映射(GTMs)可为经筛选的“多药理活性能力”化学空间提供二维表征。所谓多药理活性能力,即指可同时表征与多种不同靶点及性质相关的、具有生物学意义的活性与性质图谱的能力。可基于不同的初始描述符向量生成多个此类GTMs,每个向量均编码独特的结构特征。尽管它们的平均多药理活性能力大致相当,但在各性质特异性图谱的质量方面却存在显著差异。
本研究表明,不同的通用映射可从生物学相关化学空间的视角提供互补且高度协同的表征。本研究依托ChEMBL数据库(v.23)中与众多靶点相关的600余组活性/非活性配体数据集,构建了8个通用GTMs作为分类预测图谱的支撑。其中9个靶点可从实用诱饵目录(Directory of Useful Decoys,简称DUD)中提取真正的外部测试集,该测试集包含足够数量的活性分子与诱饵分子,且均未出现在用于构建图谱的ChEMBL配体集中。针对每个此类分子,将其投影至特定通用映射的各类分类图谱后,可类比虚拟筛选(Virtual Screening,简称VS)实验估算其活性概率。针对用于构建图谱的ChEMBL数据集的交叉验证(Cross-Validated,简称CV)平衡准确率,无法预测该图谱在虚拟筛选中的实际效果。因此,针对某一性质表现出最佳交叉验证结果的通用映射,不应被默认为最优预测器。根据适用域分析,特定映射对多数DUD化合物的预测并不可靠。与之相对,同时应用所有通用映射,并将所有适用映射返回结果的平均值作为活性可能性评分,可显著提升预测效果。基于多映射共识虚拟筛选的各项性能指标,始终优于或等同于表现最佳的单个映射。
创建时间:
2018-12-31



