Fine-tuning Pre-trained Antibody Language Models for Antigen Specificity Prediction
收藏DataCite Commons2025-06-01 更新2024-08-19 收录
下载链接:
https://figshare.com/articles/dataset/Fine-tuning_Pre-trained_Antibody_Language_Models_for_Antigen_Specificity_Prediction/25342924/1
下载链接
链接失效反馈官方服务:
资源简介:
AbstractAntibodies play a crucial role in the adaptive immune response, with their specificity to antigens being a fundamental determinant of immune function. Accurate prediction of antibody antigen specificity is vital for understanding immune responses, guiding vaccine design, and developing antibody-based therapeutics. In this study, we explore the effect of fine-tuning pre-trained antibody language models in improving binding specificity prediction to SARS-CoV-2 spike protein and influenza hemagglutinin. We fine-tuned four pre-trained antibody language models on labeled data specific to these antigens and demonstrated that fine-tuned language model classifiers exhibit enhanced predictive accuracy compared to classifiers trained on pre-trained model embeddings. Additionally, we investigated the change of model attention activations after fine-tuning to gain insights into the molecular basis of antigen recognition by antibodies. Furthermore, we applied the fine-tuned models to BCR repertoire data related to influenza and SARS-CoV-2 vaccination, demonstrating their ability to capture changes in repertoire following vaccination. Overall, our study highlights the effect of fine-tuning on pre-trained antibody language models as valuable tools to improve antigen specificity prediction.CodeAll code used for model training and testing is available on bitbucket https://bitbucket.org/kleinstein/projects/src/master/Wang2024/ . <br>An archived version of the Bitbucket repository is included in <i>code.zip</i>.DataThe following files are included in <i>data.zip</i><br><i>S_FULL.parquet</i>: Sequence and labels for S binding prediction (full-length).<br><i>S_CDR3.parquet</i>: Sequence and labels for S binding prediction (CDR3 only).<br><i>HA</i><i>_FULL</i><i>.parquet</i>: Sequence and labels for HA binding prediction (full-length).<br><i>HA</i><i>_FULL</i><i>.parquet</i>: Sequence and labels for HA binding prediction (CDR3 only).<br><i>S_repertoires.parquet</i>: Repertoires used for S binding prediction<br><i>HA_repertoires.parquet</i>: Repertoires used for HA binding prediction<br>
提供机构:
figshare
创建时间:
2024-05-23



