Coverage Score: A Model Agnostic Method to Efficiently Explore Chemical Space
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://figshare.com/articles/dataset/Coverage_Score_A_Model_Agnostic_Method_to_Efficiently_Explore_Chemical_Space/20361824
下载链接
链接失效反馈官方服务:
资源简介:
Selecting the most appropriate compounds to synthesize
and test
is a vital aspect of drug discovery. Methods like clustering and diversity
present weaknesses in selecting the optimal sets for information gain.
Active learning techniques often rely on an initial model and computationally
expensive semi-supervised batch selection. Herein, we describe a new
subset-based selection method, Coverage Score, that combines Bayesian
statistics and information entropy to balance representation and diversity
to select a maximally informative subset. Coverage Score can be influenced
by prior selections and desirable properties. In this paper, subsets
selected through Coverage Score are compared against subsets selected
through model-independent and model-dependent techniques for several
datasets. In drug-like chemical space, Coverage Score consistently
selects subsets that lead to more accurate predictions compared to
other selection methods. Subsets selected through Coverage Score produced
Random Forest models that have a root-mean-square-error up to 12.8%
lower than subsets selected at random and can retain up to 99% of
the structural dissimilarity of a diversity selection.
创建时间:
2022-07-22



