A Random Forest Algorithm Based on a cgMLST Scheme to Predict hvKP
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://www.ncbi.nlm.nih.gov/bioproject/PRJEB34922
下载链接
链接失效反馈官方服务:
资源简介:
BackgroundHypervirulent Klebsiella pneumoniae (hvKP) infections can have high morbidity and mortality owing to their invasiveness and virulence. However, there are no effective tools or biomarkers to discriminate hvKP and non-hvKP strains. We aimed to employ a random forest algorithm to predict hvKP based on core-genome data.Methods In total, 272 K. pneumoniae strains were collected from 20 tertiary hospitals in China and divided into hvKP and non-hvKP groups according to clinical criteria. Clinical data comparisons, whole-genome sequencing, virulence profile analysis and core genome multilocus sequence typing (cgMLST) were performed. We then established a random forest predictive model based on the cgMLST scheme to prospectively identify hvKP. The random forest is an ensemble learning method that generates multiple decision trees during the training process and each decision tree will output its own prediction results corresponding to the input. The predictive ability of the model was assessed by area under receiver operating characteristic curves (AUROC).ResultsPatients in the hvKP group were younger than those in non-hvKP group (median age: 58.0 and 68.0 years, respectively, p<0.001). More patients in the hvKP group had underlying diabetes mellitus (43.1% vs 20.1%, p<0.001). Clinically, carbapenem-resistant K. pneumoniae (CRKP) was less common in the hvKP group (4.1% vs 63.8%, p<0.001), whereas the K1/K2 serotype, ST23 and positive string tests were significantly higher in the hvKP group. A cgMLST-based minimal spanning tree revealed that hvKP were scattered sporadically within non-hvKP clusters. ST23 showed greater genome diversification than did ST11, according to cgMLST-based allelic differences. Primary virulence factors (rmpA, iucA, positive string test and the presence of virulence plasmid pLVPK) were poor predictors of the hypervirulence phenotype. The random forest model based on the core genome allelic profile presented excellent predictive power, both in the training and validating sets (AUROC: 0.987 and 0.999 in the training and validating sets, respectively). ConclusionsThis study compared and analyzed clinical and genomic characteristics of 272 K. pneumoniae isolates from China hospitals. A predictive model based on the core genome allelic profiles of these isolates was developed to prospectively identify hvKP. The model was more accurate than existing methods.
创建时间:
2019-12-17



