Fine-tuning Pre-trained Antibody Language Models for Antigen Specificity Prediction
收藏Figshare2024-05-23 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Fine-tuning_Pre-trained_Antibody_Language_Models_for_Antigen_Specificity_Prediction/25342924
下载链接
链接失效反馈官方服务:
资源简介:
AbstractAntibodies play a crucial role in the adaptive immune response, with their specificity to antigens being a fundamental determinant of immune function. Accurate prediction of antibody antigen specificity is vital for understanding immune responses, guiding vaccine design, and developing antibody-based therapeutics. In this study, we explore the effect of fine-tuning pre-trained antibody language models in improving binding specificity prediction to SARS-CoV-2 spike protein and influenza hemagglutinin. We fine-tuned four pre-trained antibody language models on labeled data specific to these antigens and demonstrated that fine-tuned language model classifiers exhibit enhanced predictive accuracy compared to classifiers trained on pre-trained model embeddings. Additionally, we investigated the change of model attention activations after fine-tuning to gain insights into the molecular basis of antigen recognition by antibodies. Furthermore, we applied the fine-tuned models to BCR repertoire data related to influenza and SARS-CoV-2 vaccination, demonstrating their ability to capture changes in repertoire following vaccination. Overall, our study highlights the effect of fine-tuning on pre-trained antibody language models as valuable tools to improve antigen specificity prediction.CodeAll code used for model training and testing is available on bitbucket https://bitbucket.org/kleinstein/projects/src/master/Wang2024/ . An archived version of the Bitbucket repository is included in code.zip.DataThe following files are included in data.zipS_FULL.parquet: Sequence and labels for S binding prediction (full-length).S_CDR3.parquet: Sequence and labels for S binding prediction (CDR3 only).HA_FULL.parquet: Sequence and labels for HA binding prediction (full-length).HA_FULL.parquet: Sequence and labels for HA binding prediction (CDR3 only).S_repertoires.parquet: Repertoires used for S binding predictionHA_repertoires.parquet: Repertoires used for HA binding prediction
创建时间:
2024-05-23



