five

WTA - Tables F1000

收藏
DataCite Commons2025-06-01 更新2024-07-28 收录
下载链接:
https://figshare.com/articles/dataset/WTA_-_Tables_F1000/14132192/1
下载链接
链接失效反馈
官方服务:
资源简介:
This is a collection of Excel spreadsheet tables in support of the article "Horizontal Transfer and Evolution of Wall Teichoic Acid Gene Cassettes in <i>Bacillus subtilis</i>". The tables show characteristics of the pan-genome graph (PGG) and core genes for <i>Bacillus subtilis </i>to illuminate how wall teichoic acid gene cassettes vary across different strains<i>.</i><i><br></i><b>Table 1</b>. The protein level orthologous OGCs within the WTA cassettes. Column 1 is the gene name/symbol. Column 2 is the set of OGCs determined to be orthologs at the protein level. Column 3 is the number of the 108 strains in the PGG which contain one of the protein level orthologs. Column 4 is OGC medoid sequence RefSeq annotation for one of the protein level orthologs.<b>Table 2. </b>OGC subpatterns for the WTA cassettes across clades I-VII. The OGC subpatterns show some limited recombination within the WTA cassettes but most recombination seems limited to the entire cassette. Column 1 is the region between core OGCs within the WTA cassette. Column 2 is an OGC subpattern. Columns 3-9 indicate the number of strains within a clade that has the given OGC subpattern for that row. The rows are ordered relative to their order in the WTA cassette from core OGC 3712 to core OGC 3756.<b></b><b>Supplementary Table 1</b>. The 108 strains of <i>B. subtilis</i> ssp. <i>subtilis </i>in the PGG. Column 1 is the BioSample ID ordered by and colored by clade. Column 2 is the GenBank assembly accession. Column 3 is the GenBank species. Column 4 is the GenBank strain name. Column 5 is the genome size in nucleotides. Column 6 indicates if the strain is the type strain. Column 7 gives the clade for the strain. <b>Supplementary Table 2</b>. The PGG annotation of the WTA cassetttes for the 108 <i>B. subtilis </i>ssp.<i> subtilis</i> strains. Column 1 is the contig identifier. Column 2 is the OGC locus which gives the strain identifier (BioSample ID) and the OGC number. Column 3 is the start coordinate for the OGC. Column 4 is the stop coordinate for the OGC. Column 5 is the RefSeq annotation for the OGC medoid gene sequence. Column 6 is the strain identifier (BioSample ID). <b>Supplementary Table 3</b>. The OGC membership for the 144 OGCs found in one or more of the 108 WTA cassettes. A cell value of 1 indicates the OGC is present in that strain and 0 indicates the OGC is not present in that strain. Column 1 is the OGC number ($ indicates a WTA bounding core OGC, * indicates an internal core OGC). Columns 2-109 are one column per strain labeled by the strain identifier (BioSample ID). The columns are colored as in Figure 1 by clade: clade I blue, clade II purple, clade III orange, clade IV yellow, clade V red, clade VI green, and clade VII teal. The rows are ordered from bounding core OGC 3712 to bounding core OGC 3756. The rows are roughly ordered by their ordering within WTA cassettes with parallel branches separated by blank rows. Due to some shuffling within different branches the ordering may not be exact for every strain – see Supplementary Table 2 or Supplementary Table 8 for exact ordering within strains. <b>Supplementary Table 4</b>. The edge membership for the edges between the 144 OGCs found in one or more of the 108 WTA cassettes. A cell value of 1 indicates the edge is present in that strain and 0 indicates the edge is not present in that strain. Column 1 is the edge identifier. An edge identifier has the format (OGC#_3or5,OGC#_3or5) which indicates which ends (5’ or 3’) of the OGCs are connected by the edge. Columns 2-109 are one column per strain labeled by the strain identifier (BioSample ID). The columns are colored as in Figure 1 by clade: clade I blue, clade II purple, clade III orange, clade IV yellow, clade V red, clade VI green, and clade VII teal. The rows are ordered from bounding core OGC 3712 to bounding core OGC 3756. The rows are roughly ordered by their ordering within WTA cassettes with parallel branches separated by blank rows. Due to some shuffling within different branches the ordering may not be exact for every strain – see Supplementary Table 2 or Supplementary Table 8 for exact ordering within strains. <b>Supplementary Table 5</b>. The Jaccard distance matrix over the 144 OGCs in the WTA cassettes. Columns and rows are strains (BioSample ID) and cells are pairwise Jaccard distances between strains. <b>Supplementary Table 6</b>. The Average Nucleotide Identity (ANI) matrix over the entire genome for the 108 strains. Columns and rows are strains (BioSample ID) and cells are pairwise ANI between strains. This can be modified into a distance matrix by subtracting all cell values from 100. <b>Supplementary Table 7</b>. Tabular blastp matches from the all versus all search of the 144 WTA OGC medoids. Matches were retained if the match length was ≥80% of the shorter protein and ≥40% identity. Matches are grouped by orthologs with a blank row between groups. At the bottom some weaker potential <i>tagF</i> matches are shown with three blank rows between groups. Column 1 is the query identifier. Column 2 is the subject identifier. Column 3 is the percent peptide identity. Column 4 is the query begin of match. Column 5 is the query end of match. Colum 6 is the query length. Column 7 is the subject begin of match. Column 8 is the subject end of match. Column 9 is the subject length. Column 10 is the blastp expect value. Column 11 is the blastp bitscore value. Column 12 is the subject header (medoid sequence RefSeq annotation). <b>Supplementary Table 8. </b>OGC patterns within the WTA cassettes. Column 1 is the region within the WTA. Column 2 is the gene name as determined from the GenBank annotation of the <i>B. subtilis</i> ssp. <i>subtilis</i> type strain. Column 3 is the number of strains which have the ortholog. Different OGCs appear in the same row if they are protein orthologs from Table 1. Columns 4-111 are the OGC patterns for each of the 108 strains identified by BioSample ID. Columns have been grouped and colored by clade as shown in Figure 1. Cells contain the OGC number based on the PGG annotation of the strains as given in Supplementary Table 2. <b>Supplementary Table 9. </b>Unique OGC patterns within the WTA cassettes. Columns from Supplementary Table 8 are collapsed if they have identical OGC patterns. Row 2 specifies how many columns have been collapsed for each pattern. Column 1 is the region within the WTA. Column 2 is the gene name as determined from the GenBank annotation of the <i>B. subtilis</i> ssp. <i>subtilis </i>type strain. Column 3 is the number of strains which have the ortholog. Different OGCs appear in the same row if they are protein orthologs from Table 1. Columns have been grouped and colored by clade as shown in Figure 1. Columns 4-31 are the unique OGC patterns in the 108 strains identified clade. Cells contain the OGC number based on the PGG annotation of the strains as given in Supplementary Table 2. <i></i>
提供机构:
figshare
创建时间:
2021-02-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作