DIPS-Plus: The Enhanced Database of Interacting Protein Structures for Interface Prediction
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4815266
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains replication data for the paper titled "DIPS-Plus: The Enhanced Database of Interacting Protein Structures for Interface Prediction". The dataset consists of pickled Pandas DataFrame files that can be used to train and validate protein interface prediction models. This dataset also contains the externally generated residue-level PSAIA and HH-suite3 features for users' convenience (e.g. raw MSAs and profile HMMs for each protein complex). Our GitHub repository linked in the "Additional notes" metadata section below provides more details on how we parsed through these files to create training and validation datasets. The GitHub repository for DIPS-Plus also includes scripts that can be used to impute missing feature values and convert the final "raw" complexes into DGL-compatible graph objects.
本数据集包含论文《DIPS-Plus: 用于界面预测的增强型互作蛋白质结构数据库》的复现数据。该数据集由经pickle序列化的Pandas数据框(pickled Pandas DataFrame)文件构成,可用于训练和验证蛋白质界面预测模型。为便于用户使用,本数据集还附带了外部生成的残基级PSAIA与HH-suite3特征,例如每一个蛋白质复合物的原始多序列比对文件(MSAs)以及谱型隐马尔可夫模型(profile HMMs)。本团队在下文"附加说明"元数据板块中所链接的GitHub仓库,详细说明了我们如何解析此类文件以构建训练与验证数据集。本项目DIPS-Plus的GitHub仓库还提供了相关脚本,可用于补全缺失的特征值,并将最终的"原始"复合物转换为兼容DGL(Deep Graph Library)的图对象。
创建时间:
2021-10-06



