Emergence and evolution of heterocyte glycolipid biosynthesis enabled specialized nitrogen fixation in cyanobacteria
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://zenodo.org/record/10444557
下载链接
链接失效反馈官方服务:
资源简介:
Abstract
Paleontological and phylogenomic observations have shed light on the evolution of cyanobacteria. Nevertheless, the emergence of heterocytes, specialized cells for nitrogen fixation, remains unclear. Heterocytes are surrounded by heterocyte glycolipids (HGs), which contribute to protection of the nitrogenase enzyme from oxygen. Here, by comprehensive HG identification and screening of HG biosynthesis genes throughout cyanobacteria, we identify HG analogs produced by specific and distantly related non-heterocytous cyanobacteria. These structurally less complex molecules probably acted as precursors of HGs, suggesting that HGs arose after a genomic reorganization and expansion of ancestral biosynthetic machinery, enabling the rise of cyanobacterial heterocytes in an increasingly oxygenated atmosphere. Subsequently, HG chemical structure evolved convergently in response to environmental pressures. Our results open a new chapter in the potential use of diagenetic products of HGs and HG analogs as fossils for reconstructing the evolution of multicellularity and division of labor in cyanobacteria.
Here we supply:
Supplementary Data 1. Selected cyanobacterial genomes from the PATRIC genome database (now part of the BV-BRC database). Files called ‘selected_Cyanogenomes.genome_*.20220430.txt’ are sourced from the PATRIC File Transfer Protocol server (ftp.patricbrc.org). ‘gtdbtk.bac120.summary.tsv’ is the GTDB-Tk output file, and ‘qa.summary_extended.txt’ the CheckM output file.
Supplementary Data 2. HG biosynthetic gene clusters in selected PATRIC genomes and 14 newly sequenced genomes. The file ‘islands_on_contigs.3_ORFs_in_between.expanded_island_with_nucleotide_positions.txt’ contains the location of all hits to Anabaena sp. PCC 7120 HG biosynthesis genes. ORFs were predicted with Prodigal. The structure of a contig is as follows: “genome | contig”. The structure of a hit is as follows: “ORF number on contig | query (e-value; bit-score; start of alignment in query; end of alignment in query; query coverage per subject; start of alignment in subject; end of alignment in subject; subject coverage) [nucleotide position on contig start; nucleotide position on contig end; direction]”. Non-overlapping hits on the same ORF (see Online Methods) are connected with ‘&&&’ characters. An asterisk (‘*’) indicates that the hit is located at most 3 ORFs from a contig edge. Clusters of hits that are at most 3 open reading frames (ORFs) apart are connected with ‘~~~’ characters. The file ‘Supplementary_table.script_1.txt’ contains a summary of all identified hgl islands (i.e. clusters containing at least 7 unique HG biosynthesis gene hits).
Supplementary Data 3. Phylogeny of representative cyanobacterial genomes based on a core gene superalignment. The folder contains the files used to generate Fig. 2a. The directory ‘IQ-TREE’ contains the tree file and iTOL annotation files. The file ‘dRep.representative_to_cluster.txt’ contains the dRep clusters. Note that the manually defined subclades in the iTOL annotation file ‘iTOL_annotation.manually_defined_clades.DATASET_STYLE.txt’ have a different numbering from the paper: subclades 0 and 1 are the ‘heterocytous sister clades’, and subclades 2-10 in the annotation file are heterocytous subclades 1-9 in the paper, respectively.
Supplementary Data 4. Phylogenies of a concatenation of 7 hgl island genes. The folder contains the files used to generate Fig. 4 and related figures. The directory ‘gene_trees_hgl_islands/IQ-TREE. all_7_genes_concatenated’ contains the tree file of an alignment of 255 hgl islands that were drawn on the core gene phylogeny. The directory ‘gene_trees_hgl_islands_2/IQ-TREE.all_7_genes_concatenated’ contains the tree file of an alignment of the 255 hgl islands and an additional 3 islands from non-diazotrophic cyanobacteria from within the heterocytous clade (Raphidiopsis curvata NIES-932, Cylindrospermopsis raciborskii CENA303, and Raphidiopsis brookii D9), and the island from Cylindrospermopsis raciborskii CYRF because this cyanobacterium does not possess a high-scoring hglT homolog anywhere on its genome (see Online Methods). The folder also contains the iTOL annotation files and alignments and phylogenies of individual genes.
Lipid data files. The folder contains all the UHPLC-HRMSn (Orbitrap) datafiles used in this study. The directory ‘CCY strains’ includes 26 heterocytous cyanobacterial cultures corresponding to 23 strains grown in nitrogen-deficient media, the resulting data is shown in Supplementary Table 11. Directory ‘HglT mutant’ contains the datafiles used to generate Supplementary table 16. The directory ‘LEGE strains’ includes the UHPLC-HRMSn (Orbitrap) and GC-MS datafiles corresponding to 8 cultures of 2 non-heterocytous strains grown in media with and without nitrogen for 38 to 77 days, the resulting data is shown in Supplementary Tables 11, 19 and 20.
Plasmid maps. GenBank and FASTA files of plasmids generated in this study. ‘HglT deletion’ directory contains the genomic region surrounding hglT in the wild-type strain and after deletion used to generate Supplementary Fig. 12. pAM5404 is shown in Supplementary Fig. 13 and p(A)RP0XX are shown in Supplementary Fig. 14.
All the code used in this publication including scripts used for: genome assemblies, download of genomes from public repositiories, quality and contamination checks, genome analysis, construction of the phylogenetic trees, hgl island identification, etc. The shell script 'commands.sh' contains all the code used to generate the content in the directory.
创建时间:
2025-12-30



