A Multi-Objective Genetic Algorithm for Outlier Removal
收藏NIAID Data Ecosystem2026-03-09 收录
下载链接:
https://figshare.com/articles/dataset/A_Multi_Objective_Genetic_Algorithm_for_Outlier_Removal/2094367
下载链接
链接失效反馈官方服务:
资源简介:
Quantitative structure
activity relationship (QSAR) or quantitative
structure property relationship (QSPR) models are developed to correlate
activities for sets of compounds with their structure-derived descriptors
by means of mathematical models. The presence of outliers, namely,
compounds that differ in some respect from the rest of the data set,
compromise the ability of statistical methods to derive QSAR models
with good prediction statistics. Hence, outliers should be removed
from data sets prior to model derivation. Here we present a new multi-objective
genetic algorithm for the identification and removal of outliers based
on the k nearest neighbors (kNN)
method. The algorithm was used to remove outliers from three different
data sets of pharmaceutical interest (logBBB, factor 7 inhibitors,
and dihydrofolate reductase inhibitors), and its performances were
compared with those of five other methods for outlier removal. The
results suggest that the new algorithm provides filtered data sets
that (1) better maintain the internal diversity of the parent data
sets and (2) give rise to QSAR models with much better prediction
statistics. Equally good filtered data sets in terms of these metrics
were obtained when another objective function was added to the algorithm
(termed “preservation”), forcing it to remove certain
compounds with low probability only. This option is highly useful
when specific compounds should be preferably kept in the final data
set either because they have favorable activities or because they
represent interesting molecular scaffolds. We expect this new algorithm
to be useful in future QSAR applications.
创建时间:
2016-02-12



