AbSet: A Standardized Data Set of Antibody Structures for Machine Learning Applications
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/AbSet_A_Standardized_Data_Set_of_Antibody_Structures_for_Machine_Learning_Applications/29031919
下载链接
链接失效反馈官方服务:
资源简介:
Machine learning algorithms have played a fundamental
role in the
development of therapeutic antibodies by being trained on data sets
of sequences and/or structures. However, structural data sets remain
limited, especially those that include antibody–antigen complexes.
Additionally, many of the available structures are not standardized,
and antibody-specific databases often do not provide molecular descriptors
that could enhance ML models. To address this gap, we introduce AbSet,
a curated dataset comprising over 800,000 antibody structures and
corresponding molecular descriptors, including both experimentally
determined and in silico-generated antibody–antigen complexes.
We systematically retrieved antibody structures from the Protein Data
Bank (PDB), applied rigorous standardization protocols, and expanded
the dataset through large-scale protein–protein docking to
generate structural variants of antibody–antigen interactions.
Each model was classified as high, medium, acceptable, or incorrect
quality based on structural similarity to reference experimental complexes.
This classification enables both the construction of a decoy set of
confirmed non-binders and the generation of high-confidence augmented
structural data for machine learning applications. AbSet is publicly
available via the Zenodo repository, with accompanying scripts hosted
on GitHub (https://github.com/SFBBGroup/AbSet.git).
创建时间:
2025-05-11



