A Novel Bioinformatics Pipeline and a Machine Learning Approach for Antimicrobial Resistance Phenotypic Prediction
收藏IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/novel-bioinformatics-pipeline-and-machine-learning-approach-antimicrobial-resistance-0
下载链接
链接失效反馈官方服务:
资源简介:
Overuse of antimicrobial drugs is known to cause an increase in bacterial resistance among surviving pathogens, reducing the effectiveness of future treatments. Publicly available sequencing collections, such as the National Center for Biotechnology Information Sequence Read Archive (SRA), allow for global investigation of antimicrobial resistance across pathogens. In this study, we developed a pipeline to process 10,803 bacterial isolates from the SRA (nine pathogens, four antibiotics), including 5,345 external isolates without metadata. The pipeline extracted SRA metadata to determine read layout and length, applied quality control, trimming, and decontamination. Preprocessed isolates were mapped reads to an antimicrobial resistance gene classes and to a strain-level genome reference library constructed from complete genomes on the SRA submitted between 1990 and 2020. Three classifiers\u2014L1-penalized logistic regression, random forest, and extreme gradient boosting\u2014were trained on the resulting feature matrices, and their outputs were combined in a majority-vote ensemble. Internal training resulted in 83.5\\% balanced accuracy on average, and external testing yielded 80.2\\%. Variable importance analyses identified known resistance gene classes and strain markers such as A. baumannii LAC-4, C. jejuni 81-176, NCTC13255, confirming biological relevance. This work demonstrates a scalable approach for antimicrobial resistance prediction using heterogeneous sequencing data.
提供机构:
Owen Visser



