Additional file 3 of Epiphytic common core bacteria in the microbiomes of co-located green (Ulva), brown (Saccharina) and red (Grateloupia, Gelidium) macroalgae

NIAID Data Ecosystem2026-05-01 收录

下载链接：

https://figshare.com/articles/dataset/Additional_file_3_of_Epiphytic_common_core_bacteria_in_the_microbiomes_of_co-located_green_Ulva_brown_Saccharina_and_red_Grateloupia_Gelidium_macroalgae/23281003

下载链接

链接失效反馈

官方服务：

资源简介：

Additional file 3: Description of supplementary tables. Table S1. Data associated with the 16S rRNA gene amplicon-based community profiling for all six sample sources analyzed in this study. Sequencing, assembly and binning statistics of the 23 metagenome datasets used in this study. These data include the time, season, geographical location, sample, environmental metadata for each sample and library information related to the amplicon sequencing. Furthermore included are summary analyses of the average relative abundances grouped by season and sample type at the genus and family levels, as well as statistical analyses of the proportions of core and dominant taxa in each sample. In addition, this file contains diversity indices, average relative abundances of domain, phylum, family, genus, OTU and ASV levels. Table S2. Data associated with the 16S rRNA gene-based community analyses of cultured bacterial strains, including information on sampling time, season, geographical location, source, culture conditions, 16S rRNA sequence information, new species attributes and taxonomic status information. Included are also summary analyses about average relative abundances at phylum, family, genus and OTU levels, as well as core taxa analyses results at the family and genus levels (matched to the 16S amplicon data). In addition, the file contains EZcloud and SILVA 138 sequence alignment results. Table S3. Summary data on the 1,619 MAGs and 965 draft genomes, including completeness, contamination, contig number, tRNA number, quality classification, size (Mbp), N50 value, species cluster ID in dRep, and the annotation results from GTDB SR202, EZcloud and SILVA 138 ordered according to their positions on the phylogenetic tree in Fig. 4. Table S4. Summary information about the four categories of PULs / PUL-like loci used in this study that were found with sliding window lengths from 1 and 10. The information includes: taxonomic affiliation, length (number of genes), number and type of comprised CAZyme genes, PUL composition (CAZyme genes, tonB, susCD, sulfatase genes), information on susCD genes in classical PULs and the density of CAZyme genes in each PUL. Table S5. Information on PULs from this study and published reference PULs, including descriptions of each PUL cluster in the SusC/D protein trees (single susCD PULs, hybrid susCD PULs, tandem-repeat susCD PULs, and tandem-repeat and hybrid susCD PULs). Also included is information about the source genome, the source genome type, its taxonomy and habitat as well as PUL ID, cluster number, number of CAZyme genes, composition (CAZymes gene, susCD, TonB and sulfatase genes) and genomes, possible substrate. For classical PULs, detailed information of the SusC/D protein tree is provided, including, gene ID, PUL ID, PUL type, PUL composition and potential substrates. Table S6. Details on the four categories of PULs and PUL-like loci used in this study in the 1,619 MAGs and 965 draft genomes, including gene composition. gene locus tags and gene annotations from multiple databases (KEGG, CAZy, EggNOG, COG, SignalP, MEROPS and Pfam). Table S7. Details on all BGCs predicted in the 1,619 MAGs and 965 draft genomes. This includes overall function predictions and gene function predictions according to KEGG, CAZy, EggNOG, COG, SignalP, MEROPS and Pfam searches. Table S8. Annotated putative PUL substrates based on dbCAN-PUL data (dbCAN-PUL is a database of experimentally characterized CAZyme gene clusters and their substrates), and substrate and enzyme cleavage information from the CAZy database ( http://www.cazy.org/ ). These substrates represent automatically derived similarity-based bioinformatic predictions and are thus not as accurate as biochemically characterizations of PUL functions would be.

附加文件3：补充表格说明。表S1：本研究分析的全部6个样本来源的基于16S rRNA基因扩增子的群落谱分析关联数据，以及本研究使用的23套宏基因组数据集的测序、组装与分箱统计信息。该数据集涵盖每份样本的采样时间、季节、地理位置、样本信息、环境元数据，以及与扩增子测序相关的文库信息；此外还包含按季节和样本类型分组的属、科水平平均相对丰度汇总分析，以及每份样本中核心类群与优势类群占比的统计分析。另外，本文件还包含多样性指数、域、门、科、属、操作分类单元（Operational Taxonomic Unit，OTU）和扩增子序列变异体（Amplicon Sequence Variant，ASV）水平的平均相对丰度数据。表S2：基于16S rRNA基因的培养菌株群落分析关联数据，包含采样时间、季节、地理位置、来源、培养条件、16S rRNA序列信息、新物种属性及分类地位信息；还包含门、科、属、OTU水平的平均相对丰度汇总分析，以及与16S扩增子数据匹配的科、属水平核心类群分析结果。此外，本文件包含EZcloud与SILVA 138序列比对结果。表S3：1619个宏基因组组装基因组（Metagenome-Assembled Genomes，MAGs）与965份草图基因组的汇总数据，包含完整率、污染率、重叠群数量、转运RNA（transfer RNA，tRNA）数量、质量分级、大小（单位：百万碱基对，Mbp）、N50值、dRep中的物种聚类ID，以及按图4系统发育树位置排序的GTDB SR202、EZcloud与SILVA 138注释结果。表S4：本研究使用的4类多糖利用位点（Polysaccharide Utilization Loci，PULs）/类PUL位点的汇总信息，此类位点通过滑动窗口长度1和10筛选得到。信息包含：分类归属、长度（基因数量）、所含碳水化合物活性酶（Carbohydrate-Active enZYmes，CAZyme）基因的数量与类型、PUL组成（CAZyme基因、tonB、susCD、硫酸酯酶基因）、经典PUL中的susCD基因信息，以及每个PUL的CAZyme基因密度。表S5：本研究PULs与已发表参考PULs的相关信息，包含SusC/D蛋白树中的各PUL簇描述（单susCD PULs、杂合susCD PULs、串联重复susCD PULs，以及兼具串联重复与杂合特征的susCD PULs）；还包含来源基因组、来源基因组类型、其分类学信息与生境，以及PUL ID、簇编号、CAZyme基因数量、组成（CAZyme基因、susCD、TonB及硫酸酯酶基因）、基因组、潜在底物等信息。对于经典PULs，还提供了SusC/D蛋白树的详细信息，包含基因ID、PUL ID、PUL类型、PUL组成及潜在底物。表S6：本研究使用的4类PULs与类PUL位点在1619个MAGs及965份草图基因组中的详细信息，包含基因组成、基因位点标签，以及来自多个数据库（KEGG、CAZy、EggNOG、COG、SignalP、MEROPS及Pfam）的基因注释结果。表S7：1619个MAGs与965份草图基因组中预测的全部生物合成基因簇（Biosynthetic Gene Clusters，BGCs）的详细信息，包含通过KEGG、CAZy、EggNOG、COG、SignalP、MEROPS及Pfam检索得到的整体功能预测与基因功能预测结果。表S8：基于dbCAN-PUL数据库（dbCAN-PUL是经实验表征的CAZyme基因簇及其底物的数据库）注释的推定PUL底物，以及来自CAZy数据库（http://www.cazy.org/）的底物与酶切信息。此类底物为基于相似性的自动化生物信息学预测结果，因此准确性不及PUL功能的生化表征结果。

创建时间：

2023-06-01