five

Macroalgal genomics illuminate three disparate paths to multicellularity - Supplementary Data - ANALYSES

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/7747823
下载链接
链接失效反馈
官方服务:
资源简介:
Macroalgae are a polyphyletic group of multicellular aquatic organisms vital to global climate maintenance and have a wide variety of commercial applications. The lack of genomic datasets and poor physiological records preclude understanding their ecological roles and industrial potential. We de novo sequenced 121 macroalgal genomes from various climates spanning five major latitude parallels. The resultant genomic datasets reveal genetic bases for niche habitation facilitated by morphological complexity in diverse and extreme regions and illuminate the evolutionary mechanisms behind macroalgal diversification and specialization. Adhesome genes (e.g., cadherins, integrins, and lectins), extracellular matrix enzymes, and cytoskeletal organization regulating genes (e.g., spondins, Rho-type GTPases) predominantly distinguished macroalgal genomes from their microalgae correlates. Deep neural networks could accurately classify an alga as micro- or macro- from set of significance-ranked genomic features (n = 251, entropy R2 > 0.99, RASE = 0.001) as well as adhesome gene sets (n = 110, entropy R2 > 0.86). By deciphering the macroalgal adhesome, a clear picture of the genetic basis for the development and maintenance of complex algal tissues could be resolved. Sequences from giant viruses were rampant in the macroalgal genomes and coded for zinc-finger transcription factors, ankyrins, Rieske proteins, and other exotic codomains. Lineage-specific retentions of transcription factors, cadherins, integrins, polysaccharide-acting enzymes, and receptor kinases, many with predicted viral origins, outline the divergent mechanisms facilitating multicellularity in these three macroalgal lineages. This work sheds new light on the evolution of multicellularity in three phyla (Rhodophyceae, Chlorophyceae, and Ochrophyceae v. Phaeophyceae) through the lens of large-scale genomics and paves the way for the genomic exploration of macroalgal biology.   Data S3. Analysis data files. This dataset includes data files for the analyses presented in the manuscript, including (A) Decontamination analysis, including iterative BLEACH contamination calls, GFF coordinates for contaminants, and downsampling analyses of decontaminated genomes. Related to Fig. S1. (B) HMMsearch results for decontaminated assemblies for PFAMs. Related to Figs. 2-6. (C) Ternary analyses including dcGO enrichment for >80% purity sets for the three phyla. Related to Fig. 2. (D) Comparative genomics analyses of micro- and macroalgal genomes, including intersection, response screening, metabolic pathway, GO enrichment, and aNN modeling analyses. Related to Fig. 3. (E) Adhesome analysis including HMMsearch results for adhesome domains and codomains and response screening analyses between phyla, habitat, climate, and micro- vs. macroalgae. The neural network model using the 110 adhesome PFAMs is also included in this dataset. Related to Fig. 4. (F) Endogenous viral element analyses, including VFAM HMMsearch results, EVOPs, macroalgal sequences with EsV-1-7 codomains and comparative analyses including response screens and hierarchical clustering results. Related to Fig. 5. (G) All computational scripts used for analyses in this study. Scripts are either ‘.sh’ or ‘.sbatch’ files for execultion in a Linux environment with a SLURM (https://github.com/SchedMD/slurm) high-performance computing (HPC) scheduler. Related to all analyses. Data S3. Analysis data files. This dataset includes data files for the analyses presented in the manuscript, including (A) Decontamination analysis, including iterative BLEACH contamination calls, GFF coordinates for contaminants, and downsampling analyses of decontaminated genomes. Related to Fig. S1. (B) HMMsearch results for decontaminated assemblies for PFAMs. Related to Figs. 2-6. (C) Ternary analyses including dcGO enrichment for >80% purity sets for the three phyla. Related to Fig. 2. (D) Comparative genomics analyses of micro- and macroalgal genomes, including intersection, response screening, metabolic pathway, GO enrichment, and aNN modeling analyses. Related to Fig. 3. (E) Adhesome analysis including HMMsearch results for adhesome domains and codomains and response screening analyses between phyla, habitat, climate, and micro- vs. macroalgae. The neural network model using the 110 adhesome PFAMs is also included in this dataset. Related to Fig. 4. (F) Endogenous viral element analyses, including VFAM HMMsearch results, EVOPs, macroalgal sequences with EsV-1-7 codomains and comparative analyses including response screens and hierarchical clustering results. Related to Fig. 5. (G) All computational scripts used for analyses in this study. Scripts are either ‘.sh’ or ‘.sbatch’ files for execution in a Linux environment with a SLURM (https://github.com/SchedMD/slurm) high-performance computing (HPC) scheduler. Related to all analyses.
创建时间:
2023-10-06
二维码
社区交流群
二维码
科研交流群
商业服务