Fine-tuning Pre-trained Antibody Language Models for Antigen Specificity Prediction
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Fine-tuning_Pre-trained_Antibody_Language_Models_for_Antigen_Specificity_Prediction/25342924
下载链接
链接失效反馈官方服务:
资源简介:
AbstractAntibodies play a crucial role in the adaptive immune response, with their specificity to antigens being a fundamental determinant of immune function. Accurate prediction of antibody antigen specificity is vital for understanding immune responses, guiding vaccine design, and developing antibody-based therapeutics. In this study, we explore the effect of fine-tuning pre-trained antibody language models in improving binding specificity prediction to SARS-CoV-2 spike protein and influenza hemagglutinin. We fine-tuned four pre-trained antibody language models on labeled data specific to these antigens and demonstrated that fine-tuned language model classifiers exhibit enhanced predictive accuracy compared to classifiers trained on pre-trained model embeddings. Additionally, we investigated the change of model attention activations after fine-tuning to gain insights into the molecular basis of antigen recognition by antibodies. Furthermore, we applied the fine-tuned models to BCR repertoire data related to influenza and SARS-CoV-2 vaccination, demonstrating their ability to capture changes in repertoire following vaccination. Overall, our study highlights the effect of fine-tuning on pre-trained antibody language models as valuable tools to improve antigen specificity prediction.
CodeAll code used for model training and testing is available on bitbucket https://bitbucket.org/kleinstein/projects/src/master/Wang2024/ .
An archived version of the Bitbucket repository is included in code.zip.
DataThe following files are included in data.zip
S_FULL.parquet: Sequence and labels for S binding prediction (full-length).
S_CDR3.parquet: Sequence and labels for S binding prediction (CDR3 only).
HA_FULL.parquet: Sequence and labels for HA binding prediction (full-length).
HA_FULL.parquet: Sequence and labels for HA binding prediction (CDR3 only).
S_repertoires.parquet: Repertoires used for S binding prediction
HA_repertoires.parquet: Repertoires used for HA binding prediction
创建时间:
2024-05-23



