five

Table S5 - S7 from Enterobase: hierarchical clustering of 100,000 s of bacterial genomes into species/sub-species and populations

收藏
DataCite Commons2022-07-15 更新2024-07-29 收录
下载链接:
https://rs.figshare.com/articles/dataset/Table_S5_-_S7_from_Enterobase_hierarchical_clustering_of_100_000_s_of_bacterial_genomes_into_species_sub-species_and_populations/20319112/1
下载链接
链接失效反馈
官方服务:
资源简介:
The definition of bacterial species is traditionally a taxonomic issue while bacterial populations are identified by population genetics. These assignments are species specific, and depend on the practitioner. Legacy multilocus sequence typing is commonly used to identify sequence types (STs) and clusters (ST Complexes). However, these approaches are not adequate for the millions of genomic sequences from bacterial pathogens that have been generated since 2012. EnteroBase (http://enterobase.warwick.ac.uk) automatically clusters core genome MLST allelic profiles into hierarchical clusters (HierCC) after assembling annotated draft genomes from short-read sequences. HierCC clusters span core sequence diversity from the species level down to individual transmission chains. Here we evaluate HierCC's ability to correctly assign 100 000 s of genomes to the species/subspecies and population levels for <i>Salmonella, Escherichia, Clostridoides, Yersinia, Vibrio</i> and <i>Streptococcus</i>. HierCC assignments were more consistent with maximum-likelihood super-trees of core SNPs or presence/absence of accessory genes than classical taxonomic assignments or 95% ANI. However, neither HierCC nor ANI were uniformly consistent with classical taxonomy of <i>Streptococcus.</i> HierCC was also consistent with legacy eBGs/ST Complexes in <i>Salmonella</i> or <i>Escherichia</i> and with O serogroups in <i>Salmonella</i>. Thus, EnteroBase HierCC supports the automated identification of and assignment to species/subspecies and populations for multiple genera.This article is part of a discussion meeting issue ‘Genomic population structures of microbial pathogens’.

传统上,细菌物种的定义属于分类学范畴,而细菌群体则通过群体遗传学进行界定。此类分类界定具有物种特异性,且由研究者的实践经验决定。经典多位点序列分型(multilocus sequence typing, MLST)常被用于识别序列型(sequence types, STs)及其复合体(ST Complexes)。然而,自2012年以来产生的数百万条细菌病原体基因组序列,已使得这类传统方法不再适用。肠杆菌数据库(EnteroBase,http://enterobase.warwick.ac.uk)可在对短读长序列组装得到的注释草图基因组进行分析后,自动将核心基因组MLST等位基因谱聚类为层级聚类簇(HierCC)。HierCC簇覆盖了从物种水平到单个传播链的核心序列多样性。本研究评估了HierCC能否将数十万条基因组准确归类至沙门氏菌属(*Salmonella*)、埃希氏菌属(*Escherichia*)、梭菌属(*Clostridoides*)、耶尔森氏菌属(*Yersinia*)、弧菌属(*Vibrio*)和链球菌属(*Streptococcus*)的物种/亚种及群体层级。结果显示,相较于经典分类学归类或95%平均核苷酸一致性(Average Nucleotide Identity, ANI),HierCC归类与核心单核苷酸多态性(core SNPs)的最大似然超级树或附属基因有无分布图谱的一致性更高。不过,无论是HierCC还是ANI,均未与链球菌属的经典分类学定义保持完全一致。此外,HierCC与沙门氏菌属或埃希氏菌属的经典eBGs/序列型复合体,以及沙门氏菌属的O血清群均具有一致性。综上,EnteroBase的HierCC工具可支持对多个菌属的物种/亚种及群体进行自动化识别与归类。本文属于"微生物病原体的基因组群体结构"专题讨论会议专辑的一部分。
提供机构:
The Royal Society
创建时间:
2022-07-15
二维码
社区交流群
二维码
科研交流群
商业服务