WTA - Tables F1000
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://figshare.com/articles/dataset/WTA_-_Tables_F1000/14132192
下载链接
链接失效反馈官方服务:
资源简介:
This is a collection of Excel spreadsheet tables in support of the article "Horizontal Transfer and Evolution of Wall
Teichoic Acid Gene Cassettes in Bacillus subtilis". The tables show characteristics of the pan-genome graph (PGG) and core genes for Bacillus subtilis to illuminate how wall teichoic acid gene cassettes vary across different strains.
Table 1. The protein level orthologous OGCs
within the WTA cassettes. Column 1 is
the gene name/symbol. Column 2 is the set of OGCs determined to be orthologs at
the protein level. Column 3 is the number of the 108 strains in the PGG which
contain one of the protein level orthologs. Column 4 is OGC medoid sequence
RefSeq annotation for one of the protein level orthologs.
Table
2. OGC
subpatterns for the WTA cassettes across clades I-VII. The OGC subpatterns show
some limited recombination within the WTA cassettes but most recombination
seems limited to the entire cassette. Column 1 is the region between core OGCs
within the WTA cassette. Column 2 is an OGC subpattern. Columns 3-9 indicate
the number of strains within a clade that has the given OGC subpattern for that
row. The rows are ordered relative to their order in the WTA cassette from core
OGC 3712 to core OGC 3756.
Supplementary Table 1. The 108 strains
of B. subtilis ssp. subtilis in the PGG. Column 1 is the
BioSample ID ordered by and colored by clade. Column 2 is the GenBank assembly
accession. Column 3 is the GenBank species. Column 4 is the GenBank strain
name. Column 5 is the genome size in nucleotides. Column 6 indicates if the
strain is the type strain. Column 7
gives the clade for the strain.
Supplementary Table 2. The PGG
annotation of the WTA cassetttes for the 108 B. subtilis ssp. subtilis strains. Column 1 is the contig identifier. Column 2 is the OGC locus
which gives the strain identifier (BioSample ID) and the OGC number. Column 3
is the start coordinate for the OGC. Column 4 is the stop coordinate for the OGC.
Column 5 is the RefSeq annotation for the OGC medoid gene sequence. Column 6 is
the strain identifier (BioSample ID).
Supplementary Table 3.
The OGC membership for the 144 OGCs found in one or more of the 108 WTA
cassettes. A cell value of 1 indicates
the OGC is present in that strain and 0 indicates the OGC is not present in
that strain. Column 1 is the OGC number ($ indicates a WTA bounding core OGC, *
indicates an internal core OGC). Columns 2-109 are one column per strain
labeled by the strain identifier (BioSample ID). The columns are colored as in
Figure 1 by clade: clade I blue, clade II purple, clade III orange, clade IV
yellow, clade V red, clade VI green, and clade VII teal. The rows are ordered
from bounding core OGC 3712 to bounding core OGC 3756. The rows are roughly
ordered by their ordering within WTA cassettes with parallel branches separated
by blank rows. Due to some shuffling within different branches the ordering may
not be exact for every strain – see Supplementary Table 2 or Supplementary
Table 8 for exact ordering within strains.
Supplementary Table 4.
The edge membership for the edges between the 144 OGCs found in one or more of
the 108 WTA cassettes. A cell value of
1 indicates the edge is present in that strain and 0 indicates the edge is not
present in that strain. Column 1 is the edge identifier. An edge identifier has the format (OGC#_3or5,OGC#_3or5)
which indicates which ends (5’ or 3’) of the OGCs are connected by the edge.
Columns 2-109 are one column per strain labeled by the strain identifier
(BioSample ID). The columns are colored as in Figure 1 by clade: clade I blue,
clade II purple, clade III orange, clade IV yellow, clade V red, clade VI
green, and clade VII teal. The rows are ordered from bounding core OGC 3712 to
bounding core OGC 3756. The rows are roughly ordered by their ordering within
WTA cassettes with parallel branches separated by blank rows. Due to some
shuffling within different branches the ordering may not be exact for every
strain – see Supplementary Table 2 or Supplementary Table 8 for exact ordering
within strains.
Supplementary Table 5. The Jaccard
distance matrix over the 144 OGCs in the WTA cassettes. Columns and rows are
strains (BioSample ID) and cells are pairwise Jaccard distances between
strains.
Supplementary Table 6. The Average
Nucleotide Identity (ANI) matrix over the entire genome for the 108 strains.
Columns and rows are strains (BioSample ID) and cells are pairwise ANI between
strains. This can be modified into a distance matrix by subtracting all cell
values from 100.
Supplementary Table 7. Tabular blastp
matches from the all versus all search of the 144 WTA OGC medoids. Matches were
retained if the match length was ≥80% of the shorter protein and ≥40% identity.
Matches are grouped by orthologs with a blank row between groups. At the bottom
some weaker potential tagF matches are shown with three blank rows
between groups. Column 1 is the query identifier. Column 2 is the subject identifier.
Column 3 is the percent peptide identity. Column 4 is the query begin of match.
Column 5 is the query end of match. Colum 6 is the query length. Column 7 is
the subject begin of match. Column 8 is the subject end of match. Column 9 is
the subject length. Column 10 is the blastp expect value. Column 11 is the
blastp bitscore value. Column 12 is the subject header (medoid sequence RefSeq
annotation).
Supplementary Table 8. OGC patterns within the WTA cassettes. Column
1 is the region within the WTA. Column 2 is the gene name as determined from
the GenBank annotation of the B. subtilis
ssp. subtilis type strain. Column 3
is the number of strains which have the ortholog. Different OGCs appear in the
same row if they are protein orthologs from Table 1. Columns 4-111 are the OGC
patterns for each of the 108 strains identified by BioSample ID. Columns have
been grouped and colored by clade as shown in Figure 1. Cells contain the OGC number
based on the PGG annotation of the strains as given in Supplementary Table 2.
Supplementary
Table 9. Unique OGC patterns
within the WTA cassettes. Columns from Supplementary Table 8 are collapsed if
they have identical OGC patterns. Row 2 specifies how many columns have been
collapsed for each pattern. Column 1 is the region within the WTA. Column 2 is
the gene name as determined from the GenBank annotation of the B. subtilis ssp. subtilis type strain. Column 3 is the number of strains which have
the ortholog. Different OGCs appear in the same row if they are protein
orthologs from Table 1. Columns have been grouped and colored by clade as shown
in Figure 1. Columns 4-31 are the unique OGC patterns in the 108 strains
identified clade. Cells contain the OGC number based on the PGG annotation of
the strains as given in Supplementary Table 2.
创建时间:
2021-02-28



