five

AbSet: A Standardized Dataset of Antibody Structures for Machine Learning Applications

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14888001
下载链接
链接失效反馈
官方服务:
资源简介:
AbSet is a dataset of antibodies extracted from the PDB, carefully standardized, and enriched with a subset of in silico-generated antibody-antigen complexes containing poses similar to the bound state, along with a novel set of decoys. In total, AbSet comprises over 800000 structures, encompassing antibodies with paired heavy and light chains (VH-VL), only heavy chains (VH), only light chains (VL), and single-chain variable fragments (scFv), including both free antibodies and those complexed with protein antigens. The in silico dataset was generated through molecular docking using HADDOCK following two distinct approaches: Blind Docking: Conducted using 2135 experimentally determined antibody-antigen complexes. Site-Directed Docking: Applied to 1755 complexes, where the antibody sequences were extracted, modeled using AbodyBuilder2, and then docked with their original crystallized antigen. Each docking run produced 250 poses, which were classified into four quality categories based on DockQ: high quality, medium quality, acceptable quality, and incorrect. This dataset includes molecular descriptors of amino acid residues. These descriptors were calculated for all standardized antibody structures obtained from the PDB. For in silico structures generated via docking, molecular descriptors were computed for 4 selected structures from a set of 250 poses generated per system. The code used to calculate molecular descriptors is available in the GitHub repository The descriptors include: Solvent Accessible Surface Area Relative Accessible Surface Area Atomic depth Potrusion index Hydrophobicity Sequence Half-sphere exposure calculations Cα coordinates ϕ and ψ dihedral angles Secondary structure of the protein -  Organization of Available Data: 📂 PDBs_Files  (Antibodies extracted from the PDB)│── 📂 Structures  │── 📂 Descriptors   📂 InSilicoComplexStructures-MonomersFromXtal (Blind Docking)│── 📂 Structures  │── 📂 Descriptors  │── 📂 Index DockQ    📂 InSilicoComplexStructures-MonomersFromModeling (Site-Directed Docking)│── 📂 Structures  │── 📂 Descriptors  │── 📂 Index DockQ
创建时间:
2025-02-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作