41 Genomic islands that are identified from genomes of Salmonella HC20_373
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://figshare.com/articles/dataset/41_Genomic_islands_that_are_identified_from_genomes_of_Salmonella_HC20_373/13503081
下载链接
链接失效反馈官方服务:
资源简介:
Pan-genome construction. A pan-genome of genes of over 300 bp in length was calculated s from all HC20_373 genomes using PEPPAN [39] with the parameters ‘—min_cds 300’. Genes of <300 bp were excluded because they are particular prone to assembly errors. Compatibility of the pan-genome with the reference sequences in the Salmonella whole genome MLST scheme was ensured by also specifying the ‘-g’ parameter. The resulting pan genome contains 4627 orthologous groups, of which 4351 were already present in the wgMLST scheme and 276 were novel. One continuous segment of DNA encoding 19 genes in SAL_HC8788AA_AS was scored as contaminating DNA because it was almost identical to a sequence from Veillonella parvula, as was a second segment of 5 genes carried by 23 genomes which was identical to sequences in PhiX prophage (Table S4B). These two segments were excluded from further analysis. The final pan-genome consists of 4603 genes (Table S4A).
Assignments of genes to genomic islands. We reconstructed the presence or absence of all genes in the pan genome for each internal node of a core genome maximum-likelihood phylogeny that had been constructed with TreeTime [40]. The most recent common ancestor (MRCA) of HC20_373 contained 4107 genes which were interpreted as “ancestral”. Deletions of at least five continuous genes in any internal node were scored as large deletions of genomic islands. 496 genes were acquired at internal nodes, and these assigned as the acquisition of genomic islands as previously described [41]. In brief, a directed graph of all orthologous genes was drawn for pairs of orthologous genes that were co-located on a single contig or on pairs of contigs that were linked by read-pairs that straddled both of them. The most likely gene order of the pan genome was identified with Concorde [42] as consisting of the shortest possible path that visited all the genes in the graph, and subsequently manually revised to break and re-join links to duplicated genes and collapsed repeats. All genomic islands are listed in Supplementary Table S4D, summarized in Supplementary Table S5, and illustrated in Supplementary Figure S1.
创建时间:
2021-04-01



