All That Glitters Is Not Gold: Importance of Rigorous Evaluation of Proteochemometric Models

Figshare2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/All_That_Glitters_Is_Not_Gold_Importance_of_Rigorous_Evaluation_of_Proteochemometric_Models/30041968

下载链接

链接失效反馈

官方服务：

资源简介：

Proteochemometric models (PCMs) are used in computational drug discovery to employ both protein and ligand representations jointly for bioactivity prediction. While machine learning (ML) and deep learning (DL) have come to dominate PCMs, often serving as a basis for scoring functions, rigorous evaluation standards have not always been consistently applied. In this study, using kinase-ligand bioactivity prediction as a model system, we highlight the critical roles of data set curation, permutation testing, class imbalances, and various data splitting strategies for mitigating plausible data leakage and embedding quality in determining model performance. Our findings indicate that data splitting and class imbalances are the most critical factors affecting PCM performance, emphasizing the challenges in the generalizing ability of ML/DL-PCMs. We evaluated various protein–ligand descriptors and embeddings, including those augmented with multiple sequence alignment information. However, permutation testing consistently demonstrated that protein embeddings contributed minimally to PCM efficacy. This study advocates for the adoption of stringent evaluation standards to enhance the generalizability of models to out-of-distribution data and improve benchmarking practices.

5,000+

优质数据集

54 个

任务类型

进入经典数据集