Assessment of the Generalization Abilities of Machine-Learning Scoring Functions for Structure-Based Virtual Screening

Figshare2022-10-21 更新2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/Assessment_of_the_Generalization_Abilities_of_Machine-Learning_Scoring_Functions_for_Structure-Based_Virtual_Screening/21378827

下载链接

链接失效反馈

官方服务：

资源简介：

In structure-based virtual screening (SBVS), it is critical that scoring functions capture protein–ligand atomic interactions. By focusing on the local domains of ligand binding pockets, a standardized pocket Pfam-based clustering (Pfam-cluster) approach was developed to assess the cross-target generalization ability of machine-learning scoring functions (MLSFs). Subsequently, 12 typical MLSFs were evaluated using random cross-validation (Random-CV), protein sequence similarity-based cross-validation (Seq-CV), and pocket Pfam-based cross-validation (Pfam-CV) methods. Surprisingly, all of the tested models showed decreased performances from Random-CV to Seq-CV to Pfam-CV experiments, not showing satisfactory generalization capacity. Our interpretable analysis suggested that the predictions on novel targets by MLSFs were dependent on buried solvent-accessible surface area (SASA)-related features of complex structures, with greater predicted binding affinities on complexes owning larger protein–ligand interfaces. By combining buried SASA-related features with target-specific patterns that were only shared among structurally similar compounds in the same cluster, the random forest (RF)-Score attained a good performance in the Random-CV test. Based on these findings, we strongly advise assessing the generalization ability of MLSFs with the Pfam-cluster approach and being cautious with the features learned by MLSFs.

创建时间：

2022-10-21

5,000+

优质数据集

54 个

任务类型

进入经典数据集