juppy44/chembl-full
收藏Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/juppy44/chembl-full
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-sa-3.0
language:
- en
tags:
- chemistry
- drug-discovery
- bioactivity
- chembl
- smiles
- molecular-properties
size_categories:
- 1M<n<10M
---
# ChEMBL Full Dataset
A flat, fully-joined export of the ChEMBL database (1,980 rows, 139 columns).
## What's included
Each row is one **bioactivity measurement** (IC50, Ki, EC50, Kd, GI50 etc.) enriched with:
| Source | Key columns |
|---|---|
| Activity | `pchembl_value`, `standard_type/value/units`, `activity_comment`, `ligand_efficiency` |
| Molecule | `canonical_smiles`, `standard_inchi`, all physicochemical properties, `max_phase`, `indication_class`, synonyms |
| Assay | `description` (free text), `confidence_score`, `assay_type`, cell/tissue/organism context |
| Target | `pref_name`, `target_type`, `uniprot_accession`, `component_description` |
| Mechanism | `mechanism_of_action`, `action_type`, `mechanism_comment`, `selectivity_comment` |
| Drug indications | `mesh_headings`, `efo_terms`, `max_phase_for_ind` |
| Document | `doc__title`, `doc__abstract`, `doc__doi`, `doc__pubmed_id` |
## Usage
```python
from datasets import load_dataset
ds = load_dataset("juppy44/chembl-full")
df = ds['train'].to_pandas()
```
## License
ChEMBL data is provided under [CC BY-SA 3.0](https://creativecommons.org/licenses/by-sa/3.0/).
Cite: Zdrazil et al., *Nucleic Acids Research* 2023. DOI: 10.1093/nar/gkad1004
许可证:知识共享署名-相同方式共享3.0(CC BY-SA 3.0)
语言:
- 英语
标签:
- 化学
- 药物发现
- 生物活性
- ChEMBL
- 简化分子线性输入规范(SMILES)
- 分子性质
规模类别:
- 100万 < 数据量 < 1000万
# ChEMBL完整数据集
本数据集为ChEMBL数据库的扁平化全连接导出文件,包含1980行、139列。
## 数据集内容
每一行代表一次**生物活性测定**(涵盖IC50、Ki、EC50、Kd、GI50等测定类型),并附带以下补充信息:
| 数据类别 | 关键字段 |
|---|---|
| 活性数据 | `pchembl_value`、`standard_type/value/units`、`activity_comment`、`ligand_efficiency` |
| 分子信息 | `canonical_smiles`(简化分子线性输入规范,SMILES)、`standard_inchi`、所有理化性质、`max_phase`、`indication_class`、同义词 |
| 实验测定 | `description`(自由文本)、`confidence_score`、`assay_type`、细胞/组织/生物体背景 |
| 靶标信息 | `pref_name`、`target_type`、`uniprot_accession`、`component_description` |
| 作用机制 | `mechanism_of_action`、`action_type`、`mechanism_comment`、`selectivity_comment` |
| 药物适应症 | `mesh_headings`、`efo_terms`、`max_phase_for_ind` |
| 文献资料 | `doc__title`、`doc__abstract`、`doc__doi`、`doc__pubmed_id` |
## 使用方法
使用示例如下:
python
from datasets import load_dataset
ds = load_dataset("juppy44/chembl-full")
df = ds['train'].to_pandas()
## 许可证
ChEMBL数据依据[知识共享署名-相同方式共享3.0(CC BY-SA 3.0)](https://creativecommons.org/licenses/by-sa/3.0/)协议发布。引用格式:Zdrazil等人,*《核酸研究》* 2023。DOI: 10.1093/nar/gkad1004
提供机构:
juppy44



