药物设计结构功能分析工具
收藏国家基础学科公共科学数据中心2026-01-03 收录
下载链接:
https://nbsdc.cn/general/dataDetail?id=6952a5af195d266fa53fe871&type=1
下载链接
链接失效反馈官方服务:
资源简介:
KarmaDock 数据集是面向小分子–蛋白质配体对接与虚拟筛选任务的结构化基准资源,依托 PDBBind 2020 等权威结构数据库构建。数据集中每条记录对应一个实验解析的蛋白–配体复合物,包含蛋白三维结构、配体结构以及实验测得的结合亲和力等信息,并在预处理阶段统一完成口袋识别、坐标标准化和残基筛选。在此基础上,KarmaDock 还提供与之配套的图结构与对接结果,将原始蛋白–配体结构转化为 DGL 图文件,节点和边特征精确表征原子类型、键类型及空间关系;生成的预测结合姿态(SDF)和打分结果(CSV),与标签文件通过复合物 ID 一一对应。整体数据组织遵循“原始结构—图表示—对接结果—分析标签”的层级结构,既可作为深度学习对接模型的训练与验证数据集,也适合用于评估新方法在结合姿态预测、亲和力回归及虚拟筛选富集度等方面的表现。
The KarmaDock dataset is a structured benchmark resource for small molecule-protein ligand docking and virtual screening tasks, constructed based on authoritative structural databases such as PDBBind 2020. Each record in the dataset corresponds to an experimentally resolved protein-ligand complex, containing information including the 3D structure of the protein, ligand structure, and experimentally measured binding affinity. During the preprocessing phase, pocket identification, coordinate normalization, and residue filtering are uniformly performed. On this basis, KarmaDock also provides supporting graph structures and docking results: the original protein-ligand structures are converted into DGL graph files, where node and edge features accurately characterize atom types, bond types, and spatial relationships; the generated predicted binding poses (SDF) and scoring results (CSV) are one-to-one mapped to label files via complex IDs. The overall data organization follows a hierarchical structure of 'original structure—graph representation—docking results—analysis labels', which can be used not only as a training and validation dataset for deep learning docking models but also for evaluating the performance of new methods in binding pose prediction, affinity regression, and virtual screening enrichment.
提供机构:
北京大学



