Optimization of Molecular Representativeness
收藏NIAID Data Ecosystem2026-03-08 收录
下载链接:
https://figshare.com/articles/dataset/Optimization_of_Molecular_Representativeness/2280673
下载链接
链接失效反馈官方服务:
资源简介:
Representative
subsets selected from within larger data sets are
useful in many chemoinformatics applications including the design
of information-rich compound libraries, the selection of compounds
for biological evaluation, and the development of reliable quantitative
structure–activity relationship (QSAR) models. Such subsets
can overcome many of the problems typical of diverse subsets, most
notably the tendency of the latter to focus on outliers. Yet only
a few algorithms for the selection of representative subsets have
been reported in the literature. Here we report on the development
of two algorithms for the selection of representative subsets from
within parent data sets based on the optimization of a newly devised
representativeness function either alone or simultaneously with the
MaxMin function. The performances of the new algorithms were evaluated
using several measures representing their ability to produce (1) subsets
which are, on average, close to data set compounds; (2) subsets which,
on average, span the same space as spanned by the entire data set;
(3) subsets mirroring the distribution of biological indications in
a parent data set; and (4) test sets which are well predicted by qualitative
QSAR models built on data set compounds. We demonstrate that for three
data sets (containing biological indication data, logBBB permeation
data, and Plasmodium falciparum inhibition
data), subsets obtained using the new algorithms are more representative
than subsets obtained by hierarchical clustering, k-means clustering, or the MaxMin optimization at least in three of
these measures.
创建时间:
2016-02-17



