QSAR datasets - Meta-QSAR

Mendeley Data2024-03-27 更新2024-06-26 收录

下载链接：

https://data.mendeley.com/datasets/spwgrcnjdg

下载链接

链接失效反馈

官方服务：

资源简介：

We extracted 2,219 protein targets from ChEMBL with a diverse number of drug-like chemical compounds, ranging from 30 to about 6,000, each target resulting in a dataset with as many examples as compounds. The datasets were originally used in (Olier et al. Meta-QSAR: a large-scale application of meta-learning to drug design and discovery. Machine Learning, 2018, 107 (1), 285-311). Chemical compounds were intrinsically described using a standard fingerprint representation (as it is the most commonly used in QSAR learning), where the presence or absence of a particular molecular substructure in a molecule (e.g. methyl group, benzene ring) is indicated by a Boolean variable. Specifically, we used the RDKit to calculate the 1024 bits FCFP4 fingerprint representation, which is one of the extended-connectivity fingerprints (Rogers and Hahn, 2010) for molecular characterisation. Each dataset consisted of 1,024 input binary variables, one for each fingerprint bit, and one floating-point output variable which represented the chemical compound activities against the target. We used IC50 values, inhibitory drug concentrations at 50%. IC50 value states the concentration of the drug compound that is required to block or inhibit 50% of the proteins. This response data has been normalised by taking the negative log of the drug concentrations that inhibited 50% of a target (pXC50).

创建时间：

2024-01-23

5,000+

优质数据集

54 个

任务类型

进入经典数据集