vector-institute/atom3d-ppi
收藏Hugging Face2024-07-15 更新2024-07-22 收录
下载链接:
https://hf-mirror.com/datasets/vector-institute/atom3d-ppi
下载链接
链接失效反馈官方服务:
资源简介:
PPI(蛋白质-蛋白质相互作用)数据集用于预测两个不同蛋白质中的氨基酸对在结合时是否会相互作用。氨基酸被定义为相互作用,如果它们的任何重原子之间的距离在6埃以内。数据集包含以下特征:input_ids(两个蛋白质的原子编号)、coords(两个蛋白质的3D坐标)、labels(相互作用标签,1表示相互作用,0表示不相互作用)和token_type_ids(标识符,用于区分两个蛋白质的原子编号和坐标)。数据集分为train、val和test三个部分,分别包含87303、31050和15268个样本。
The dataset describes the task of predicting protein-protein interfaces (PPI), which involves predicting which pairs of amino acids, spanning two different proteins, will interact upon binding. Amino acids are defined as interacting if any of their heavy atoms are within 6 Angstroms from one another. The dataset includes the following features: input_ids representing the set of atomic numbers for both proteins in the complex concatenated together, coords representing the 3D coordinates for both proteins in the complex concatenated together, labels indicating whether the amino acid pairs interact (1 for interacting, 0 for not interacting), and token_type_ids representing a mask that corresponds to which inputs_ids/coords belong to the two individual proteins involved in the complex (0 for protein 1, 1 for protein 2). The dataset is split into training, validation, and test sets, containing 87303, 31050, and 15268 samples respectively.
提供机构:
vector-institute



