five

A Novel Bioinformatics Pipeline and a Machine Learning Approach for Antimicrobial Resistance Phenotypic Prediction

收藏
IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/novel-bioinformatics-pipeline-and-machine-learning-approach-antimicrobial-resistance-0
下载链接
链接失效反馈
官方服务:
资源简介:
Overuse of antimicrobial drugs is known to cause an increase in bacterial resistance among surviving pathogens, reducing the effectiveness of future treatments. Publicly available sequencing collections, such as the National Center for Biotechnology Information Sequence Read Archive (SRA), allow for global investigation of antimicrobial resistance across pathogens. In this study, we developed a pipeline to process 10,803 bacterial isolates from the SRA (nine pathogens, four antibiotics), including 5,345 external isolates without metadata. The pipeline extracted SRA metadata to determine read layout and length, applied quality control, trimming, and decontamination. Preprocessed isolates were mapped reads to an antimicrobial resistance gene classes and to a strain-level genome reference library constructed from complete genomes on the SRA submitted between 1990 and 2020. Three classifiers\u2014L1-penalized logistic regression, random forest, and extreme gradient boosting\u2014were trained on the resulting feature matrices, and their outputs were combined in a majority-vote ensemble. Internal training resulted in 83.5\\% balanced accuracy on average, and external testing yielded 80.2\\%. Variable importance analyses identified known resistance gene classes and strain markers such as A. baumannii LAC-4, C. jejuni 81-176, NCTC13255, confirming biological relevance. This work demonstrates a scalable approach for antimicrobial resistance prediction using heterogeneous sequencing data.
提供机构:
Owen Visser
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作