Endocrine disruption: the noise in available data adversely impact the models' performance
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/3935807
下载链接
链接失效反馈官方服务:
资源简介:
This paper is devoted to the analysis of available experimental data and preparation of predictive models for binding affinity of molecules with respect to two nuclear receptors involved in endocrine disruption: the Estrogen (ER) and the Androgen (AR) receptor. The ED-relevant data were retrieved from multiple sources, including the CERAP, CoMPARA, and the Tox21 data challenge projects as well as ChEMBL and PubChem databases. Data analysis performed with the help of Generative Topographic Mapping technique revealed the problem of a low agreement between experimental values issued from different sources.
Collected data were used to train both classification models for AR and ER binding activities and regression models for Relative Binding Affinity (RBA) and median Inhibition Concentration (IC50) models. These models displayed relatively poor performance in classification (sensitivities ER = 0.34, AR = 0.49) and in regression (determination coefficient R2 for the RBA and IC50 models in external validation varied from 0.44 to 0.76). Our analysis demonstrates that low models performances resulted from misinterpreted experimental endpoints or wrongly reported values.
Developed models and collected data sets included of 6215 (ER) and 3789 (AR) unique compounds; they are freely available.
The repository includes data on estrogen and androgen receptor binding behavior (binder, non-binder), median inhibitory concentration (IC50) and relative binding affinity (RBA).
SDF fields:
DB = database; where: COMPARA = Collaborative Modelling Project for Androgen Receptor Activity; CERAPP = Collaborative Estrogen Receptor Activity Prediction Project; Tox-DC = data from Tox21 program; PubChem = data from PubChem.
Set = whether the compound was used in training or test set for the given model
Receptor = AR stands for Androgen Receptor and ER stands for Estrogen Receptor
binding_prp = binding behaviour for the classification model (ER and AR). 1 = binder; = non-binder
IC50 (nM) and logIC50 = median inhibitory concentration values in nanoMolar and log.
RBA(%) and logRBA = relative binding affinity values in % and log.
创建时间:
2020-10-08



