five

Data from: Pan-genome and phylogeny of Bacillus cereus sensu lato

收藏
DataONE2017-07-31 更新2024-06-26 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈
官方服务:
资源简介:
Background: Bacillus cereus sensu lato (s. l.) is an ecologically diverse bacterial group of medical and agricultural significance. In this study, I use publicly available genomes to characterize the B. cereus s. l. pan-genome and perform the largest phylogenetic and population genetic analyses of this group to date in terms of the number of genes and taxa included. With these fundamental data in hand, I identify genes associated with particular phenotypic traits (i.e., "pan-GWAS" analysis), and quantify the degree to which taxa sharing common attributes are phylogenetically clustered. Methods: A rapid k-mer based approach (Mash) was used to create reduced representations of selected Bacillus genomes, and a fast distance-based phylogenetic analysis of this data (FastME) was performed to determine which species should be included in B. cereus s. l. The complete genomes of eight B. cereus s. l. species were annotated de novo with Prokka, and these annotations were used by Roary to produce the B. cereus s. l. pan-genome. Scoary was used to associate gene presence and absence patterns with various phenotypes. The orthologous protein sequence clusters produced by Roary were filtered and used to build HaMStR databases of gene models that were used in turn to construct phylogenetic data matrices. Phylogenetic analyses used RAxML, DendroPy, ClonalFrameML, PAUP, and SplitsTree. Bayesian model-based population genetic analysis assigned taxa to clusters using hierBAPS. The genealogical sorting index was used to quantify the phylogenetic clustering of taxa sharing common attributes. Results: The B. cereus s. l. pan-genome currently consists of ≈60,000 genes, ≈600 of which are "core" (common to at least 99% of taxa sampled). Pan-GWAS analysis revealed genes associated with phenotypes such as isolation source, oxygen requirement, and ability to cause diseases such as anthrax or food poisoning. Extensive phylogenetic analyses using an unprecedented amount of data produced phylogenies that were largely concordant with each other and with previous studies. Phylogenetic support as measured by bootstrap probabilities increased markedly when all suitable pan-genome data was included in phylogenetic analyses, as opposed to when only core genes were used. Bayesian population genetic analysis recommended subdividing the three major clades of B. cereus s. l. into nine clusters. Taxa sharing common traits and species designations exhibited varying degrees of phylogenetic clustering.

**背景**:蜡样芽孢杆菌广义群(Bacillus cereus sensu lato, s. l.)是一类生态多样性丰富、兼具医学与农业重要性的细菌类群。本研究利用公开可用的基因组,对蜡样芽孢杆菌广义群的泛基因组(pan-genome)进行表征,并就纳入的基因与类群数量而言,开展了迄今为止该类群规模最大的系统发育与群体遗传学分析。依托这些基础数据,本研究鉴定了与特定表型性状相关的基因(即“泛基因组全基因组关联分析(pan-GWAS)”),并量化了具有共同特征的类群在系统发育上的聚集程度。 **方法**:本研究采用基于k-mer的快速分析工具(Mash)对所选芽孢杆菌基因组进行降维表征,并针对该数据开展基于距离的快速系统发育分析(FastME),以确定应纳入蜡样芽孢杆菌广义群的物种范围。使用Prokka对8个蜡样芽孢杆菌广义群物种的完整基因组进行从头注释,随后利用Roary基于这些注释结果构建蜡样芽孢杆菌广义群的泛基因组。借助Scoary将基因的存在/缺失模式与各类表型进行关联分析。将Roary生成的直系同源蛋白序列簇进行过滤,用于构建基因模型数据库(HaMStR),进而用于构建系统发育数据矩阵。系统发育分析采用了RAxML、DendroPy、ClonalFrameML、PAUP以及SplitsTree工具。基于贝叶斯模型的群体遗传分析采用hierBAPS将类群划分为不同聚类群。利用谱系分选指数(genealogical sorting index)量化具有共同特征类群的系统发育聚集程度。 **结果**:目前蜡样芽孢杆菌广义群的泛基因组包含约60000个基因,其中约600个为“核心基因”(即在至少99%的采样类群中均存在的基因)。泛基因组全基因组关联分析(pan-GWAS)揭示了与分离源、需氧特性以及引发炭疽或食物中毒等疾病的能力相关的基因。借助前所未有的海量数据开展的大规模系统发育分析,所得系统发育树彼此间以及与既往研究结果均高度一致。相较于仅使用核心基因的分析,当纳入所有适用的泛基因组数据进行系统发育分析时,基于自展概率衡量的系统发育支持率显著提升。贝叶斯群体遗传分析建议将蜡样芽孢杆菌广义群的三大演化支进一步划分为9个聚类群。具有共同性状以及被归为同一物种的类群,其系统发育聚集程度存在差异。
创建时间:
2017-07-31
二维码
社区交流群
二维码
科研交流群
商业服务