five

Endocrine disruption: the noise in available data adversely impact the models' performance

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/3935807
下载链接
链接失效反馈
官方服务:
资源简介:
This paper is devoted to the analysis of available experimental data and preparation of predictive models for binding affinity of molecules with respect to two nuclear receptors involved in endocrine disruption: the Estrogen (ER) and the Androgen (AR) receptor. The ED-relevant data were retrieved from multiple sources, including the CERAP, CoMPARA, and the Tox21 data challenge projects as well as ChEMBL and PubChem databases. Data analysis performed with the help of Generative Topographic Mapping technique revealed the problem of a low agreement between experimental values issued from different sources. Collected data were used to train both classification models for AR and ER binding activities and regression models for Relative Binding Affinity (RBA) and median Inhibition Concentration (IC50) models. These models displayed relatively poor performance in classification (sensitivities ER = 0.34, AR = 0.49) and in regression (determination coefficient R2 for the RBA and IC50 models in external validation varied from 0.44 to 0.76). Our analysis demonstrates that low models performances resulted from misinterpreted experimental endpoints or wrongly reported values. Developed models and collected data sets included of 6215 (ER) and 3789 (AR) unique compounds; they are freely available. The repository includes data on estrogen and androgen receptor binding behavior (binder, non-binder), median inhibitory concentration (IC50) and relative binding affinity (RBA).  SDF fields: DB = database; where: COMPARA = Collaborative Modelling Project for Androgen Receptor Activity; CERAPP = Collaborative Estrogen Receptor Activity Prediction Project; Tox-DC = data from Tox21 program; PubChem = data from PubChem.  Set = whether the compound was used in training or test set for the given model Receptor = AR stands for Androgen Receptor and ER stands for Estrogen Receptor binding_prp = binding behaviour for the classification model (ER and AR). 1 = binder;  = non-binder IC50 (nM) and logIC50 = median inhibitory concentration values in nanoMolar and log. RBA(%) and logRBA = relative binding affinity values in % and log.
创建时间:
2020-10-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作