Virtual Screening with Generative Topographic Maps: How Many Maps Are Required?

NIAID Data Ecosystem2026-03-10 收录

下载链接：

https://figshare.com/articles/dataset/Virtual_Screening_with_Generative_Topographic_Maps_How_Many_Maps_Are_Required_/7538054

下载链接

链接失效反馈

官方服务：

资源简介：

Universal generative topographic maps (GTMs) provide two-dimensional representations of chemical space selected for their “polypharmacological competence”, that is, the ability to simultaneously represent meaningful activity and property landscapes, associated with many distinct targets and properties. Several such GTMs can be generated, each based on a different initial descriptor vector, encoding distinct structural features. While their average polypharmacological competence may indeed be equivalent, they nevertheless significantly diverge with respect to the quality of each property-specific landscape. In this work, we show that distinct universal maps represent complementary and strongly synergistic views of biologically relevant chemical space. Eight universal GTMs were employed as support for predictive classification landscapes, using more than 600 active/inactive ligand series associated with as many targets from the ChEMBL database (v.23). For nine of these targets, it was possible to extract, from the Directory of Useful Decoys (DUD), truly external sets featuring sufficient “actives” and “decoys” not present in the landscape-defining ChEMBL ligand sets. For each such molecule, projected on every class landscape of a particular universal map, a probability of activity was estimated, in analogy to a virtual screening (VS) experiment. Cross-validated (CV) balanced accuracy on landscape-defining ChEMBL data was unable to predict the success of that landscape in VS. Thus, the universal map with best CV results for a given property should not be prioritized as the implicitly best predictor. For a given map, predictions for many DUD compounds are not trustworthy, according to applicability domain considerations. By contrast, simultaneous application of all universal maps, and rating of the likelihood of activity as the mean returned by all applicable maps, significantly improved prediction results. Performance measures in consensus VS using multiple maps were always superior or similar to those of the best individual map.

通用生成式拓扑映射（GTMs）可为经筛选的“多药理活性能力”化学空间提供二维表征。所谓多药理活性能力，即指可同时表征与多种不同靶点及性质相关的、具有生物学意义的活性与性质图谱的能力。可基于不同的初始描述符向量生成多个此类GTMs，每个向量均编码独特的结构特征。尽管它们的平均多药理活性能力大致相当，但在各性质特异性图谱的质量方面却存在显著差异。本研究表明，不同的通用映射可从生物学相关化学空间的视角提供互补且高度协同的表征。本研究依托ChEMBL数据库（v.23）中与众多靶点相关的600余组活性/非活性配体数据集，构建了8个通用GTMs作为分类预测图谱的支撑。其中9个靶点可从实用诱饵目录（Directory of Useful Decoys，简称DUD）中提取真正的外部测试集，该测试集包含足够数量的活性分子与诱饵分子，且均未出现在用于构建图谱的ChEMBL配体集中。针对每个此类分子，将其投影至特定通用映射的各类分类图谱后，可类比虚拟筛选（Virtual Screening，简称VS）实验估算其活性概率。针对用于构建图谱的ChEMBL数据集的交叉验证（Cross-Validated，简称CV）平衡准确率，无法预测该图谱在虚拟筛选中的实际效果。因此，针对某一性质表现出最佳交叉验证结果的通用映射，不应被默认为最优预测器。根据适用域分析，特定映射对多数DUD化合物的预测并不可靠。与之相对，同时应用所有通用映射，并将所有适用映射返回结果的平均值作为活性可能性评分，可显著提升预测效果。基于多映射共识虚拟筛选的各项性能指标，始终优于或等同于表现最佳的单个映射。

创建时间：

2018-12-31

5,000+

优质数据集

54 个

任务类型

进入经典数据集