five

PPB-Affinity: Protein-Protein Binding Affinity dataset for AI-based protein drug discovery

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/11070823
下载链接
链接失效反馈
官方服务:
资源简介:
Prediction of protein-protein binding (PPB) affinity plays an important role in large-molecular drug discovery. Deep learning (DL) has been adopted to predict the changes of PPB binding affinities upon mutations, but there was a scarcity of studies predicting the PPB affinity itself. The major reason is the paucity of open-source dataset with PPB affinity data. To address this gap, the current study introduced a large comprehensive PPB affinity (PPB-Affinity) dataset. The PPB-Affinity dataset contains key information such as crystal structures of protein-protein complexes (with or without protein mutation patterns), PPB affinity, receptor protein chain, ligand protein chain, etc. To the best of our knowledge, this is the largest publicly available PPB affinity dataset, and we believe it will significantly advance drug discovery by streamlining the screening of potential large-molecule drugs. We also developed a deep-learning benchmark model with this dataset to predict the PPB affinity, providing a foundational comparison for the research community. Codes for PPB-Affinity database preparation is  disclosed at https://github.com/Huatsing-Lau/PPB-Affinity-DataPrepWorkflow.Codes for the benchmark algorithm is disclosed at https://github.com/ChenPy00/PPB-Affinity. The article related to the PPB-Affinity dataset can be found at https://doi.org/10.1038/s41597-024-03997-4. Files are orginized as follows: - PPB-Affinity.xlsx - samples_deleted.zip - PPB-Affinity-AF.zip - PDB/   - Affinity Benchmark v5.5/     - file1.pdb     - file2.pdb     - ...     - filek.pdb   - ATLAS/   - PDBbind v2020/   - SAbDab/   - SKEMPIv2.0/

蛋白质-蛋白质结合(protein-protein binding, PPB)亲和力预测在大分子药物研发领域具有关键作用。深度学习(deep learning, DL)已被用于预测突变后PPB结合亲和力的变化,但目前鲜有研究直接针对PPB亲和力本身开展预测。造成这一研究缺口的核心原因在于,带有PPB亲和力数据的开源数据集极度匮乏。为填补这一空白,本研究构建了一个大型综合性PPB亲和力数据集(PPB-Affinity)。该数据集涵盖蛋白质-蛋白质复合物晶体结构(含或不含蛋白质突变模式)、PPB亲和力、受体蛋白链、配体蛋白链等核心信息。据我们所知,这是目前规模最大的公开可用PPB亲和力数据集;我们相信,该数据集可通过优化潜在大分子药物的筛选流程,显著推动药物研发进程。此外,我们基于该数据集开发了用于预测PPB亲和力的深度学习基准模型,可为相关研究领域提供基础对比参照。 PPB-Affinity数据集的构建代码已公开于https://github.com/Huatsing-Lau/PPB-Affinity-DataPrepWorkflow;基准算法代码已公开于https://github.com/ChenPy00/PPB-Affinity。 有关PPB-Affinity数据集的相关研究论文可通过https://doi.org/10.1038/s41597-024-03997-4获取。 数据集文件组织形式如下: - PPB-Affinity.xlsx - samples_deleted.zip - PPB-Affinity-AF.zip - PDB/ - Affinity Benchmark v5.5/ - file1.pdb - file2.pdb - …… - filek.pdb - ATLAS/ - PDBbind v2020/ - SAbDab/ - SKEMPIv2.0/
创建时间:
2024-12-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作