PAPreC: A Pipeline for Antigenicity Prediction Comparison Methods across Bacteria
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/PAPreC_A_Pipeline_for_Antigenicity_Prediction_Comparison_Methods_across_Bacteria/28331788
下载链接
链接失效反馈官方服务:
资源简介:
Antigenicity prediction plays a crucial role in vaccine
development,
antibody-based therapies, and diagnostic assays, as this predictive
approach helps assess the potential of molecular structures to induce
and recruit immune cells and drive antibody production. Several existing
prediction methods, which target complete proteins and epitopes identified
through reverse vaccinology, face limitations regarding input data
constraints, feature extraction strategies, and insufficient flexibility
for model evaluation and interpretation. This work presents PAPreC
(Pipeline for Antigenicity Prediction Comparison), an open-source,
versatile workflow (available at https://github.com/YasCoMa/paprec_nx_workflow) designed to address these challenges. PAPreC systematically examines
three key factors: the selection of training data sets, feature extraction
methods (including physicochemical descriptors and ESM-2 encoder-derived
embeddings), and diverse classifiers. It provides automated model
evaluation, interpretability through SHapley Additive exPlanations
(SHAP) analysis, and applicability domain assessments, enabling researchers
to identify optimal configurations for their specific data sets. Applying
PAPreC to IEDB data as a reference, we demonstrate its effectiveness
across the ESKAPE pathogen group. A case study involving Pseudomonas aeruginosa and Staphylococcus aureus shows that specific feature
configurations are more suitable for different sequence types, and
that ESM-2 embeddings enhance model performance. Moreover, our results
indicate that separate models for Gram-positive and Gram-negative
bacteria are not required. PAPreC offers a comprehensive, adaptable,
and robust framework to streamline and improve antigenicity prediction
for diverse bacterial data sets.
创建时间:
2025-02-03



