Development of cgMLST schemes for Klebsiella oxytoca and Klebsiella planticola species complexes
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/7477601
下载链接
链接失效反馈官方服务:
资源简介:
Core genome multi-locus sequence typing (cgMLST) provides a high level of resolution typing among bacterial populations. Two novel cgMLST schemes have been implemented for Klebsiella oxytoca and Klebsiella (Raoultella) planticola species complexes (KoSC and KplaSC), ubiquitous bacteria broadly responsible for clinical and community-acquired infections, to investigate on population structure with enhanced resolution over conventional MLST analysis.
Dataset screening
Public genomes have been downloaded from the NCBI genome repository representing the overall population of KoSC (K. oxytoca, K. michiganensis, K. grimontii and K. pasteurii) and KplaSC (K. planticola and K. ornithynolitica) isolated worldwide. Eight strains from the Collection of Institut Pasteur (CIP, Paris, France) (n=1 K. planticola, n=2 K. ornithynolitica for KplaSC; n=1 K. oxytoca, n=1 K. michiganensis, n=2 K. grimontii and n=1 K. pasteurii for KoSC), and six K. oxytoca reference strains (ATCC® 700324™, ATCC® 13182™, ATCC® 49131™, ATCC® 51983™, ATCC® 13030™ and ATCC® BAA-3059™; retrieved at https://www.atcc.org/) were further included in the public dataset. A first screening of public genomes was carried out based on assemblies’ quality metrics, by filtering out sequences with more than 500 contigs, total genome size outside the typical length of Klebsiella spp. (<4.5 Mb and >6.5 Mb) and N50 lower than 30000. Strains with mixed species as calculated by the ribosomal MLST based species identification tool (https://pubmlst.org/species-id) were also excluded. The remaining assemblies were then clustered together with KoSC and KplaSC genomes isolated from artisanal food productions (dataset available at https://www.ebi.ac.uk/ena/browser/view/PRJEB56668) based on pairwise MASH-based distance (https://gitlab.pasteur.fr/GIPhy/JolyTree), to carefully select reference genomes for the scheme construction. Representative genomes for each cluster were selected based on best contiguity metrics, and the diversity of available metadata (location and collection date) and of sequence types (classical MLST or rMLST where MLST was not available). A total of 104 and 51 genomes for KoSC and KplaSC (directories “KoSC/Genomes” and “KplaSC/Genomes”) were used to build a cgMLST scheme for each species complex.
cgMLST schemes creation
The chewBBACA suite v2.8.5 (https://github.com/B-UMMI/chewBBACA) was implemented to build cgMLST schemes by following recommendations and steps available at https://github.com/B-UMMI/chewBBACA#i-whole-genome-multilocus-sequence-typing-wgmlst-schema-creation. Paralogous loci were removed from the wgMLST schemes (directories “KoSC/Schema/wgMLST_17686_schema.tar.gz” and “KplaSC/Schema/wgMLST_13051_schema.tar.gz”) and the loci present in at least 99% of samples were extracted. The quality of core loci was estimated with the chewBBACA Schema Evaluation module and the reported loci with high allele variability were excluded from the scheme.
The cgMLST schemes were finally composed of 3,272 loci for KoSC (file “KoSC/Schema/cgMLST_3272_Genes_list.txt”) and 2,957 loci for KplaSC (file “KplaSC/Schema/cgMLST_2957_Genes_list.txt”).
wgMLST and cgMLST allele profiles have been included for KoSC (files “KoSC/Allele_profiles/wgMLST_allele_profiles.tsv” and “KoSC/Allele_profiles/cgMLST_allele_profiles.tsv”) and KplaSC (files “KplaSC/Allele_profiles/wgMLST_allele_profiles.tsv” and “KplaSC/Allele_profiles/cgMLST_allele_profiles.tsv”).
创建时间:
2023-04-13



