Regression models generated by APRANK (computational prioritization of antigenic proteins and peptides from complete pathogen proteomes)

NIAID Data Ecosystem2026-03-12 收录

下载链接：

http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.zcrjdfnb1

下载链接

链接失效反馈

官方服务：

资源简介：

Availability of highly parallelized immunoassays has renewed interest in the discovery of serology-based biomarkers for infectious diseases. Protein and peptide microarrays now provide a high-throughput platform for immunological screening of potential antigens and B-cell epitopes. However, there is still a need to prioritize relevant probes when designing these arrays. In this work we describe a computational method called APRANK (Antigenic Protein and Peptide Ranker) which integrates multiple molecular features to prioritize antigenic targets starting from a given pathogen proteome. These features include subcellular localization, presence of repetitive motifs, natively disordered regions, secondary structure, transmembrane spans and predicted interaction with the immune system. We applied this method to the prioritization of potential diagnostic antigens and peptides in a number of pathogen proteomes and human diseases: Borrelia burgdorferi (Lyme disease), Brucella melitensis (Brucellosis), Coxiella burnetii (Q fever), Escherichia coli (Gastroenteritis), Francisella tularensis (Tularemia), Leishmania braziliensis (Leishmaniasis), Leptospira interrogans (Leptospirosis), Mycobacterium leprae (Leprae), Mycobacterium tuberculosis (Tuberculosis), Plasmodium falciparum (Malaria), Porphyromonas gingivalis (Periodontal disease), Staphylococcus aureus (Bacteremia), Streptococcus pyogenes (Group A Streptococcal infections), Toxoplasma gondii (Toxoplasmosis) and Trypanosoma cruzi (Chagas Disease). After training a linear regression model the method achieves good to excellent performance on most species, measured by the enrichment of validated antigens at the top of the ranking. An unbiased validation using independent data sets shows APRANK is successful in predicting antigenicity for all pathogen species tested. We make APRANK available to facilitate the identification of novel diagnostic antigens in infectious diseases. Methods A curated dataset of known / validated antigens was obtained from each of the 15 human pathogens listed (bacteria, eukaryotes). Other proteins encoded in these annotated genomes were considered non-antigenic or with no antigenicity precedence/information. Using these data a number of protein features were calculated or predicted using a bioionformaics pipeline (described in the manuscript). To create and train a generalized-linear model first we created 15 individual training sets (one per species) containing a set of 3000 proteins (with balanced positive and negative training examples). A merged training set containing data from all species was used to train the protein model. A similar approach was followed to create and train a model for peptides (epitopes). In this case, we created 15 individual training sets containing a balanced set of 100,000 peptides.

创建时间：

2021-06-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集