Prediction-Inspired Intelligent Training for the Development of Classification Read-across Structure–Activity Relationship (c-RASAR) Models for Organic Skin Sensitizers: Assessment of Classification Error Rate from Novel Similarity Coefficients
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://figshare.com/articles/dataset/Prediction-Inspired_Intelligent_Training_for_the_Development_of_Classification_Read-across_Structure_Activity_Relationship_c-RASAR_Models_for_Organic_Skin_Sensitizers_Assessment_of_Classification_Error_Rate_from_Novel_Similarity_Coefficient/23968576
下载链接
链接失效反馈官方服务:
资源简介:
The advancements in the field of cheminformatics have
led to a
reduction in animal testing to estimate the activity, property, and
toxicity of query chemicals. Read-across structure–activity
relationship (RASAR) is an emerging concept that utilizes various
similarity functions derived from chemical information to develop
highly predictive models. Unlike quantitative structure–activity
relationship (QSAR) models, RASAR descriptors of a query compound
are computed from its close congeners instead of the compound itself,
thus targeting predictions in the model training phase. The objective
of the present study is not to propose new QSAR models for skin sensitization
but to demonstrate the enhancement in the quality of predictions of
the skin-sensitizing potential of organic compounds by developing
classification-based RASAR (c-RASAR) models. A diverse, previously
curated data set was collected from the literature for which 2D descriptors
were computed. The extracted essential features were then used to
develop a classification-based linear discriminant analysis (LDA)
QSAR model. Furthermore, from the read-across-based predictions, RASAR
descriptors were calculated using the basic settings of the hyperparameters
for the Laplacian Kernel-based optimum similarity measure. After feature
selection, an LDA c-RASAR model was developed, which superseded the
prediction quality of the LDA–QSAR model. Various other combinations
of RASAR descriptors were also taken to develop additional c-RASAR
models, all showing better prediction quality than the LDA QSAR model
while using a lower number of descriptors. Various other machine learning
c-RASAR models were also developed for comparison purposes. In this
work, we have proposed and analyzed three new similarity metrics: gm_class, sm1, and sm2. The first one is an indicator variable used to generate a simple
univariate c-RASAR model with good prediction ability, while the remaining
two are similarity indices used to analyze possible activity cliffs
in the training and test sets and are believed to play an important
role in the modelability analysis of data sets.
创建时间:
2023-08-16



