Genomics of new ciliate lineages provides insight into the evolution of obligate anaerobiosis - single gene datasets for phylogenomic analysis of anaerobic ciliates (SAL, Ciliophora), protein datasets for mitochondrial pathways prediction, and mitochondrial genomes

NIAID Data Ecosystem2026-03-11 收录

下载链接：

http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.vx0k6djnm

下载链接

链接失效反馈

官方服务：

资源简介：

Oxygen plays a crucial role in energetic metabolism of most eukaryotes. Yet, adaptations to low oxygen concentrations leading to anaerobiosis have independently arisen in many eukaryotic lineages, resulting in a broad spectrum of reduced and modified mitochondrial organelles (MROs). In this study, we present the discovery of two new class-level lineages of free-living marine anaerobic ciliates, Muranotrichea, cl. nov. and Parablepharismea, cl. nov., that, together with the class Armophorea, form a major clade of obligate anaerobes (APM ciliates) within the SAL (Spirotrichea, Armophorea, Litostomatea) group. To deepen our understanding of the evolution of anaerobiosis in ciliates, we predicted the mitochondrial metabolism of cultured representatives from all three classes in the APM clade, using transcriptomic and metagenomic data, and performed phylogenomic analyses to assess their evolutionary relationships. The predicted mitochondrial metabolism of representatives from the APM ciliates reveals functional adaptations of metabolic pathways that were present in their last common ancestor and likely led to the successful colonization and diversification of the group in various anoxic environments. Furthermore, we discuss the possible relationship of Parablepharismea to the uncultured deep-sea class Cariacotrichea based on single gene analyses. Like most anaerobic ciliates, all studied species of the APM clade host symbionts, which we propose to be a significant accelerating factor in the transitions to an obligately anaerobic lifestyle. Our results provide an insight into the evolutionary mechanisms of early transitions to anaerobiosis and shed light on fine-scale adaptations in MROs over a relatively short evolutionary timeframe. Methods Illumina NextSeq Sequencing Metagenomes were sequenced from the picked cells and a culture using Illumina NextSeq (Illumina Inc.) (150-bp paired-end reads, 300-bp insert size). Transcriptome was sequenced by EMBL Genomics Core Facility (GeneCore), Heidelberg, Germany. Metagenomic and Transcriptomic Assemblies For metagenomes, Trimmomatic v0.36 was used to filter and trim paired reads from all samples. Filtered and trimmed reads were then combined and co-assembled with IDBA-UD v.1.1.1. Metagenomic contigs > 1 kb were binned using CONCOCT as implemented in Anvi’o v2.0.3, which bins contigs based on nucleotide composition and differential coverage data from mapping reads to the co-assembly. The reads mapping to an affiliated genomic bin of each target ciliate from the initial binning results were re-assembled with IDBA-UD v.1.1.1 to improve bin quality. CheckM was used to evaluate the completeness, contamination and taxonomy of genomic bins, as well as to identify rRNA gene sequences. Additional metagenomic reads from Muranothrix gubernata SUMMARTIN, sequenced subsequently in order to obtain mitochondrial genome from Muranotrichea, were quality trimmed using Trimmomatic v0.39, corrected by Rcorrector and assembled using Metaspades (SPAdes genome assembler v3.13.1) with default settings for paired-end reads in "only-assembler" mode. Contigs representing mitogenomes were identified in metagenome assemblies by BLAST using available protein sequences from ciliate mitogenomes as a query (GU057832.1, JN383843.1, NC_014262.1). Subsequently, we used manual approach based on iterative searches in Illumina reads (BLASTN) followed by read alignment to the corresponding contig (Geneious Prime; “map to reference” option) to unambiguously extend the longest Parablepharisma sp. mitogenome fragment. This resulted in 41,931 bp long sequence (opposed to original contig length 35,877 bp). Initial annotations of mitogenome sequences were obtained using MFannot (online version from October 23, 2019). Predicted genes recovered by MFannot (including hypothetical proteins) were individually checked to confirm/reveal their identity and to exclude possible bacterial contamination using BLAST against the nr GenBank database. Graphical genome maps were produced by OrganellarGenomeDRAW (OGDRAW) v1.3.1. For transcriptomic data, quality trimming and Illumina adapter and sequence contamination removal was done using the program Trimmomatic. Transcriptomes were assembled using the software package Trinity. Datasets preparation and rRNA gene phylogenetic analyses Two data sets, containing 18S rRNA gene sequences/18S and 28S rRNA gene sequences concatenated using Mesquite [86], consisted of 17/2 newly determined sequences of Muranotrichea and Parablepharismea, 1/2 newly determined and 24/0 GenBank sequences of Metopida (Armophorea), 12/1 GenBank sequences of Clevelandellida (Armophorea), 3/0 GenBank sequences of Armophorida (Armophorea), 8/6 GenBank sequences of Litostomatea, 28/31 GenBank sequences of Spirotrichea sensu lato, 2/1 GenBank sequences of Odontostomatea, 1/0 GenBank sequence of Cariacotrichea, and 41/41 GenBank sequences of other ciliates (CONThreeP, Protocruziiea, Heterotrichea, Karyorelictea) used as outgroup. Environmental GenBank sequences KT346287 and KT346288, affiliated as related to the newly described taxa by BLAST, were excluded from the analyses due to chimeric origin. The sequences were aligned using MAFFT on the MAFFT 7 server (http://mafft.cbrc.jp/alignment/server/) with L-INS-i algorithm and default settings. The alignments were manually checked for chimeras and edited using BioEdit 7.0.9.0. 18S rRNA, and concatenated 18S and 28S rRNA phylogenetic trees were constructed by maximum likelihood (ML) and Bayesian analyses. ML analyses were performed in RAxML 8.0.0 under the GTRGAMMAI model with 1000 rapid bootstraps. Node support was assessed by ML analysis of 1000 bootstrap data sets. Bayesian analyses were performed using Phylobayes with GTR CAT. For 18S rRNA gene, four independent chains were run for 34,720 generations (maxdiff = 0.0531, 20% burnin). For concatenated 18S and 28S rRNA genes, four independent chains were run for 150,000 generations (maxdiff = 0.14, 20% burnin). Phylogenomic analyses Extensive phylogenomic datasets of ciliates, containing ~160 and 124 genes, respectively, were used as reference datasets. Up to five best hits were recovered (E-value < 1E-5) for each gene from the Muranothrix transcriptome using BLAST and translated into protein sequences using the Barrel-O-Monkeys package (http://rogerlab.biochemistryandmolecularbiology.dal.ca/monkeybarrel.php). In the case of metagenomes, genes of interest were recovered using in-house script (https://github.com/DavidZihala/raw_gene_prediction?fbclid=IwAR2xbkMEUe8Fvl2A_6uJF2r4oT9IJWx7iopVQMxjqgows5VcxsXTYCi5lP8). Each recovered sequence was then reciprocally blasted against reference datasets enriched with well-defined known paralogs (e.g., EF-1α and EFL) to assist in the identification and removal of deep-paralogs. BLAST searches against the Swissprot database was used to trim away non-homologous sequence data at the end of predicted genes. Subsequently, all sequences were added to single-gene alignments. These were then aligned using the MAFFT-LINSI algorithm (with default parameters), trimmed in BMGE (gap threshold set to 0.3) and single gene trees were reconstructed using RAxML (model setting PROTGAMMALGF) with 100 rapid bootstraps. Single gene trees were then investigated by eye and proper orthologs were selected from targeted taxa, while putative paralogs were removed, creating final version of the single gene datasets. Each single gene dataset was then aligned by MAFFT LINSI algorithm (with default parameters), followed by trimming using BMGE v 1.2 (gap threshold set to 0.3). Trimmed alignments were then concatenated into one super-matrix using the program alvert from the Barrel-O-Monkeys package. The maximum likelihood phylogenomic tree was constructed in IqTree using the C40+LG+F+G model with 1000 PMSF bootstraps. Bayesian analysis was performed using Phylobayes with CAT-GTR+G with constant sites removed. Four independent chains were run for 16,000 generations (maxdiff = 0.002, 20% burnin). In-silico predictions of mitochondrial metabolism Genes of interests were collected from previously published studies of anaerobic mitochondria and from the published mitochondrial proteome of Tetrahymena thermophila. Candidate sequences from ciliates were recovered by blast and hits with e-value < 1e-10 were extracted and added to the initial datasets for each gene. Published datasets were used as starting datasets, where available. When a published dataset was not available, starting datasets were constructed by blast searches of a seed sequence against nr (max target seqs = 100). Additionally, sequences from taxonomically diverse eukaryotic and prokaryotic representatives identified through searches of the gene name in NCBI protein and RefSeq databases. The resulting sequences were combined into a single dataset and clustered in CD-Hit at 75%. Each gene dataset was aligned using MAFFT-LINSI (default settings), trimmed using trimal (gappyout) and trees were constructed using RAxML (PROTGAMMALG with 100 rapid bootstraps). Each alignment/tree has been repeatedly inspected by eye and sequences were added/removed as necessary. Each tree was then manually evaluated for presence absence of genes of interest in studied lineages.

创建时间：

2020-05-05