KinFragLib: Combinatorial library
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/3954931
下载链接
链接失效反馈官方服务:
资源简介:
KinFragLib: Exploring the Kinase Inhibitor Space Using Subpocket-Focused Fragmentation and Recombination.
Project description.
Protein kinases play a crucial role in many cell signaling processes, making them one of the most important families of drug targets. In this context, fragment-based drug design strategies have been successfully applied to develop novel kinase inhibitors, usually following a knowledge-driven approach to optimize a focused set of fragments to a potent kinase inhibitor.
Alternatively, KinFragLib is a new method that allows to explore and extend the chemical space of kinase inhibitors using data-driven fragmentation and recombination, built on available structural kinome data from the KLIFS database for over 3,200 kinase DFG-in complexes. The computational fragmentation method splits the co-crystallized non-covalent kinase inhibitors into fragments with respect to their 3D proximity to six predefined functionally relevant subpocket centers. The resulting fragment library consists of six subpocket pools with over 9,000 fragments, available at https://github.com/volkamerlab/KinFragLib.
KinFragLib offers two main applications: (i) In-depth analyses of the chemical space of known kinase inhibitors, subpocket characteristics and connections, as well as (ii) subpocket-informed recombination of fragments to generate potential novel inhibitors. The latter showed that recombining only a subset of 727 representative fragments generated a combinatorial library of 11.3 million molecules, containing, besides some known kinase inhibitors, more than 99% novel chemical matter compared to ChEMBL and 55% molecules compliant with Lipinski's rule of five.
Combinatorial library dataset.
The dataset offered here is part of the KinFragLib GitHub repository (https://github.com/volkamerlab/KinFragLib) and contains the metadata and properties of the KinFragLib combinatorial library.
1. Raw data
combinatorial_library.json: Full combinatorial library, please refer to notebooks/4_1_combinatorial_library_data_preparation.ipynb at https://github.com/volkamerlab/KinFragLib for detailed information about this data format.
combinatorial_library_deduplicated.json: Deduplicated combinatorial library (based on InChIs).
chembl_standardized_inchi.csv: Standardized ChEMBL 33 molecules in the form of InChI strings.
2. Processed data
Data extracted from combinatorial_library_deduplicated.json, performed in notebooks/4_1_combinatorial_library_data_preparation.ipynb at https://github.com/volkamerlab/KinFragLib.
n_atoms.csv: Number of atoms for each recombined ligand.
ro5.csv: Number of ligands that fulfill Lipinski's rule of five (Ro5) and its individual criteria; number of ligands in total.
subpockets.csv: Number of ligands per subpocket combination.
original_exact.json: Ligands with exact matches in original ligands, i.e. KLIFS ligands that were used for the fragmentation.
original_substructure.json: Ligands with substructure matches in original ligands, i.e. KLIFS ligands that were used for the fragmentation.
chembl_exact.json: Ligands with exact matches in ChEMBL.
chembl_most_similar.json: Most similar ligand in ChEMBL for each recombined ligand.
chembl_highly_similar.json: Most similar ligand in ChEMBL for each recombined ligand with similarity greater than 0.9.
Usage.
This dataset can be used to run the notebooks available on https://github.com/volkamerlab/KinFragLib.
Clone the KinFragLib repository.
Download the tar.bz2 file provided here.
Extract the archive content to the combinatorial library folder in your local KinFragLib folder and run the notebooks.
tar -xvf combinatorial_library.tar.bz2 -C /path_to_kinfraglib/data/combinatorial_library/
Citation.
This dataset is part of the KinFragLib publication:
Sydow, D., Schmiel, P., Mortier, J., and Volkamer, A. KinFragLib: Exploring the Kinase Inhibitor Space Using Subpocket-Focused Fragmentation and Recombination. J. Chem. Inf. Model. 2020. https://pubs.acs.org/doi/abs/10.1021/acs.jcim.0c00839
KinFragLib: 基于亚口袋聚焦断裂与重组探索激酶抑制剂化学空间
项目说明
蛋白激酶在众多细胞信号传导过程中发挥关键作用,是最为重要的药物靶点家族之一。在此背景下,基于片段的药物发现策略已被成功应用于开发新型激酶抑制剂,通常采用知识驱动的方法,将聚焦的片段集优化为强效激酶抑制剂。
与之不同的是,KinFragLib是一种全新方法,可通过数据驱动的断裂与重组技术探索并拓展激酶抑制剂的化学空间,其构建依托KLIFS数据库(KLIFS)中超过3200个激酶DFG-in(DFG-in)复合物的已公开结构激酶组数据。该计算断裂方法将共结晶的非共价激酶抑制剂,根据其与六个预定义功能相关亚口袋中心的三维空间邻近性拆分为对应片段。所得片段库包含六个亚口袋池,共计超过9000个片段,可通过https://github.com/volkamerlab/KinFragLib获取。
KinFragLib提供两大核心应用:(i) 深入分析已知激酶抑制剂的化学空间、亚口袋特征与关联,以及(ii) 基于亚口袋信息的片段重组以生成潜在新型抑制剂。后者的研究表明,仅重组727个代表性片段的子集即可生成包含1130万种分子的组合库,其中除部分已知激酶抑制剂外,与ChEMBL数据库(ChEMBL)相比,超过99%的组分均为全新化学实体,且55%的分子符合Lipinski五规则(Lipinski's rule of five)。
组合库数据集
本数据集隶属于KinFragLib的GitHub仓库(https://github.com/volkamerlab/KinFragLib),包含KinFragLib组合库的元数据与属性信息。
1. 原始数据
- combinatorial_library.json:完整组合库,关于该数据格式的详细说明,请参考https://github.com/volkamerlab/KinFragLib中notebooks/4_1_combinatorial_library_data_preparation.ipynb。
- combinatorial_library_deduplicated.json:去重后的组合库(基于InChI字符串(InChI))。
- chembl_standardized_inchi.csv:采用InChI字符串形式存储的标准化ChEMBL 33分子数据集。
2. 处理后数据
从combinatorial_library_deduplicated.json中提取得到的数据,处理流程详见https://github.com/volkamerlab/KinFragLib中的notebooks/4_1_combinatorial_library_data_preparation.ipynb。
- n_atoms.csv:每个重组配体的原子数目。
- ro5.csv:符合Lipinski五规则(Ro5)及其各项判定标准的配体数量,以及总配体数量。
- subpockets.csv:每种亚口袋组合对应的配体数量。
- original_exact.json:与原始配体(即用于片段断裂的KLIFS配体)完全匹配的配体。
- original_substructure.json:与原始配体(即用于片段断裂的KLIFS配体)存在子结构匹配的配体。
- chembl_exact.json:与ChEMBL数据库中存在完全匹配的配体。
- chembl_most_similar.json:每个重组配体在ChEMBL数据库中的最相似配体。
- chembl_highly_similar.json:相似度大于0.9的每个重组配体在ChEMBL数据库中的最相似配体。
使用说明
本数据集可用于运行https://github.com/volkamerlab/KinFragLib中提供的Jupyter Notebook。
1. 克隆KinFragLib仓库。
2. 下载此处提供的tar.bz2压缩文件。
3. 将归档内容解压至本地KinFragLib文件夹下的data/combinatorial_library目录,运行命令如下:
tar -xvf combinatorial_library.tar.bz2 -C /path_to_kinfraglib/data/combinatorial_library/
随后即可运行相关Notebook。
引用说明
本数据集隶属于KinFragLib研究论文:
Sydow, D., Schmiel, P., Mortier, J., and Volkamer, A. KinFragLib: Exploring the Kinase Inhibitor Space Using Subpocket-Focused Fragmentation and Recombination. J. Chem. Inf. Model. 2020. https://pubs.acs.org/doi/abs/10.1021/acs.jcim.0c00839
创建时间:
2024-03-21



