Evolutionary history, novel lineages, and symbiont coevolution in the ant tribe Camponotini (Hymenoptera: Formicidae)
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.hx3ffbgqd
下载链接
链接失效反馈官方服务:
资源简介:
Many insect groups have acquired obligate microbial symbionts, and the resulting associations can have important ecological and evolutionary consequences. A notable example among ants is the species-rich tribe Camponotini, whose members derive nutritional benefits from a vertically inherited, bacterial endosymbiont Blochmannia. We generate ultraconserved element (UCE) phylogenomic data for 220 ingroup and 5 outgroup taxa to reconstruct a detailed evolutionary history of the Camponotini, including inference of divergence times and dispersal events. Under multiple modes of analysis, including both concatenation and species-tree approaches, we recover a well-supported backbone phylogeny comprising eight lineages: three large genera (Camponotus, Colobopsis, Polyrhachis) and several smaller genera or clusters of genera. Three novel lineages are uncovered that cannot be placed in any existing genus: Lathidris gen. n., from the mountains of Mesoamerica; Retalimyrma gen. n., from the Indian Himalayas; and Uwari gen. n., from eastern Asia. The species in these new genera were described and placed erroneously in Camponotus. The tribe Camponotini is estimated to have a crown origin in the Eocene (median age 38.4 Ma), with successively younger crown ages for Colobopsis (22.5 Ma), Camponotus (18.6 Ma), and Polyrhachis (18.5 Ma). We infer an Australasian or Indomalayan origin for the tribe, with multiple dispersal events to the Afrotropics, Palearctic region, and New World. Phylogenetic analysis of selected Blochmannia genes from a subset of 97 camponotine taxa yields results that are largely congruent with the ant host phylogeny, at least for well-supported nodes, but we find evidence that Blochmannia from some old lineages—especially Lathidris—may have discordant histories, suggesting possible lability of this symbiosis in the early evolution of camponotine ants.
Methods
Phylogenomics
Taxon sampling and UCE data generation
Our taxon set comprises 220 species of Camponotini, representing all genera and most subgenera, and five outgroup species belonging to related genera in the subfamily Formicinae (Table S1). Camponotine ants were sampled roughly in proportion to the number of described species in each genus. Smaller taxon sets were employed for analyses comparing Blochmannia and ant phylogenies.
DNA was extracted from single ants, either adults or pupae, using the DNeasy Blood and Tissue Kit (Qiagen, Valencia, CA) and quantified with a Qubit fluorometer (HS Assay Kit, Life Technologies Inc., Carlsbad, CA). We sheared 5–50 ng input DNA to a target size of ~600 bp using either a Diagenode BioRuptor (Diagenode Inc., Denville, NJ) or QSonica Q800R3-110 (Qsonica Inc., Newtown, CT). This product served as input for the generation of ultraconserved element (UCE) sequence data, following a protocol described by Branstetter et al. (2017a), and involving the following steps: dual-indexed library preparation, library pooling, UCE-targeted enrichment, qPCR quantification of DNA concentrations, final pooling, and multiplex sequencing. UCE enrichment was carried out using a set of custom-designed probes, Hymenoptera 2.5Kv2A (MYcroarray, Inc., now ArborBiosciences, Ann Arbor, MI), targeting 2524 UCE loci (Branstetter et al., 2017a). Sequencing was performed on an Illumina HiSeq 2500 at the University of Utah Huntsman Cancer Center. For 142 samples, library preparation, target enrichment, and sequencing were performed by RAPiD Genomics (Gainesville, FL) with similar protocols. Most Blochmannia assemblies and analyses were based on whole genome sequencing (WGS) runs using the platforms above. For select specimens requiring deeper sequencing, WGS was performed using DNA-seq library construction and 300 bp PE sequencing with an Illumina MiSeq v3 at the Duke University Sequencing and Genome Technologies Center.
Processing of UCE data
Demultiplexed FASTQ data were cleaned and trimmed with Illumiprocessor, a wrapper program using Trimmomatic (Bolger et al., 2014), in PHYLUCE v. 1.7.1 (Faircloth, 2016). Most cleaned reads were assembled with SPAdes v. 3.12.0 (Bankevich et al., 2012); a minority of older samples, identified by a “D” extraction code less than D1700, were assembled with Trinity v2013-02-25 (Grabherr et al., 2011). Sequence statistics are given in Table S2. Matching of UCE loci to probes, alignment with Mafft, and internal alignment trimming with Gblocks (Castresana, 2000) was carried out within PHYLUCE as described in Blaimer et al. (2015; 2016). We then filtered the aligned, trimmed UCE loci based on the representation of UCE loci across taxa. We chose two subsets for further analyses: a dataset of 1440 loci in which each locus was represented in >90% of the taxa, and a dataset of 2076 loci in which each locus was represented in >80% of the taxa. These two subsets were then concatenated and further trimmed for misaligned sequences using the program Spruceup (Borowiec, 2019). We set the cutoff initially to 0.95, 0.97, and 0.98 and kept all other parameters at the default values. For the resulting spruceup-trimmed (90%-0.95-spruceup, 90%-0.97-spruceup, 90%-0.98-spruceup, 80%-0.95-spruceup, 80%-0.97-spruceup, 80%-0.98-spruceup hereafter), as well as the untrimmed 90% and 80% alignments, we calculated alignment statistics, such as amount of missing data, number of parsimony-informative sites (PIC), and base composition, using the program AMAS v1.0 (Borowiec, 2016) (Table S3).
Extraction of Blochmannia sequence data
Most Blochmannia analyses were based on de novo assemblies of WGS data. In these cases, demultiplexed, paired-end FASTQ data were cleaned and trimmed using Illumiprocessor, which invokes Trimmomatic (Bolger et al., 2014). De novo assemblies were performed using Unicycler/0.5.0, an assembly pipeline for bacterial genomes that functions as a SPAdes-optimizer when assembling Illumina data (Wick et al., 2017). For each assembly, contigs >2500 bp were blasted against all protein-coding genes of published Blochmannia genome sequences, using blastx in NCBI-BLAST 2.12.0. Gene regions were extracted from contig sequences based on the coordinates of highly significant blast matches. Blochmannia protein-coding genes were consistently the 'best hits,' typically with e-values of 0.0. Our main analysis is based on seven concatenated protein-coding genes (dnaE, gidA, groEL, gyrA, gyrB, rpoB, and rpoC), with an alignment totaling 20,019 bp positions. These genes were selected based on their distribution across the Blochmannia genome and their central role in bacterial functions. While assemblies varied in their completeness (ranging from numerous, shorter Blochmannia contigs to complete or near-complete genomes), this analysis is restricted to the 97 samples that are included in the host phylogeny and for which we could confidently detect Blochmannia genes. For select, deeper lineages with complete or near-complete Blochmannia genomes, we also extracted 16SrDNA and 23S rDNA genes. Alignments of protein-coding genes were performed in MUSCLE (Edgar, 2004) as translated amino acid sequences and, post-alignment, back-translated to nucleotide codons. Alignments of 16SrDNA and 23S rDNA were performed using SINA rRNA aligner hosted by the SILVA project (Pruesse et al., 2012). Poorly aligned regions were trimmed using Trimal version 1.5.10 (Capella-Gutiérrez et al., 2009). All datasets were examined by eye to remove any remaining ambiguous alignment regions.
To test the monophyly of Blochmannia and to evaluate root positions within Blochmannia, certain analyses included outgroups selected from published sequences of close relatives based on prior phylogenetic studies (Wernegreen et al., 2009; Jackson et al., 2022). These outgroups include closely related endosymbionts of other (non-camponotine) ant groups, as well as endosymbionts of mealybugs, psyllids, and various other insects as described in the legends of the supplementary figures.
Ant phylogenomic analyses
Phylogenetic analyses were performed both with and without employing data partitioning on four concatenated, spruceup-trimmed matrices (90%-0.97-spruceup, 90%-0.98-spruceup, 80%-0.97-spruceup, 80%-0.98-spruceup). We did not proceed with the 0.95 cutoff as this amount of trimming proved too stringent for these datasets. We partitioned our datasets using the Sliding-Window Site Characteristics (SWSC-EN) algorithm (Tagliacollo & Lanfear, 2018), which models patterns of rate variation within and among UCE loci by dividing loci into core and flanking regions. The r cluster algorithm (Lanfear et al., 2014) in PartitionFinder2 (Lanfear et al., 2017) was then used to combine subsets with similar properties. We analyzed these concatenated data matrices with 1142 (90%-0.98-spruceup), 1081 (90%-0.97-spruceup), 1587 (80%-0.98-spruceup), and 1585 (80%-0.97-spruceup) partitions, as well as their unpartitioned counterparts with Maximum Likelihood (ML) best-tree and 1000 ultrafast bootstrap searches in IQ-TREE v2.1.3 (Minh et al., 2020; Hoang et al., 2018). We employed ModelFinder in IQ-TREE (Kalyaanamoorthy et al., 2017) for unpartitioned matrices while implementing a GTR+G model for data subsets in partitioned matrices. Analyses specified the most distantly related taxon, Formica neogagates, as an outgroup. To perform coalescent analyses, we also estimated the best ML gene tree for each of the 2076 and 1440 UCE loci with >80% and >90% of taxa present, respectively, using IQ-TREE. These two sets of ML best trees were then used to perform coalescent species-tree analyses in ASTRAL-III v5.7.8 (Zhang et al., 2018).
Divergence dating
Until recently the tribe Camponotini contained two monotypic fossil genera, one fossil species attributed to Polyrhachis F. Smith, and about 30 fossil species assigned to Camponotus. The descriptions and illustrations of most of these fossils, however, inspire little confidence in their placement in the tribe Camponotini, since key features of the mandibles, antennal insertions, frontoclypeal complex, and metapleural gland (Bolton, 2003; Ward et al., 2016) are not discernable. Even for extant species, distinctions between Camponotus and some other genera in the same tribe are subtle and difficult to capture (Ward et al., 2016; Ward & Boudinot, 2021). For fossils, the uncertainty is much greater. Accordingly, we concur with Boudinot et al. (2024) that most of these fossils should be treated as incertae sedis in Formicinae. We are left with two described fossils that can be placed in the Camponotini with high confidence. (1) Eocamponotus mengei (Mayr), the only camponotine-like ant known from Baltic amber, shows a good degree of preservation of morphological features and was recovered as crown Camponotini by Boudinot et al. (2022) with strong support. It was not recovered in crown Camponotus, however. (2) Polyrhachis annosa Wappler et al., an impression fossil from late Miocene deposits of Greece (Wappler et al., 2009), can be reasonably assigned to its genus. Hence, for the purpose of divergence dating, we are limited to two fossil calibrations in the ingroup (Table S4): 1) a calibration on crown-group Camponotini with a minimum age of 36 Ma (Aleksandrova & Zaporozhets, 2008), and 2) a calibration on crown-group Polyrhachis with a minimum age of 5.3 Ma.
We performed divergence dating using approximate likelihood in MCMCTREE and codeml as included in PAMLv4.9 (Yang, 2007), using both the 80%-0.98-spruceup and 90%-0.98-spruceup matrices and the best maximum likelihood tree resulting from SWSC-EN partitioned analysis of these matrices. We pruned these matrices and trees to exclude all outgroups except the most closely related taxon, Myrmoteras iriodum Moffett, to prevent possible artifacts resulting from an imbalance in taxon sampling and rate heterogeneity between ingroup and outgroup (Duchêne et al., 2015; Spasojevic et al., 2021), and to reduce computational cost. In addition to the two fossil calibrations outlined above, we further applied a secondary calibration on the root node based on divergence ages between Camponotini and Myrmoteras estimated across Formicinae by Blaimer et al. (2015). We hereby applied a broad age bracket of 61.8–95.6 Ma representing the 95% HPD intervals estimated across three analyses in that study. We used the default settings for the calibration priors, a heavy-tailed density based on a truncated Cauchy distribution with an offset p=0.1, a scale parameter c=1, and a left tail probability of a=0.025. By default, all calibrations in MCMCTREE are implemented as soft bounds. We set up four independent runs using the independent-rates model as a clock model, a GTR model for substitutions, and otherwise default parameters. Assessing MCMC convergence and effective sample sizes using Tracer v1.7.2 (Rambaut et al., 2018), we achieved convergence (i.e., most ESS >200) using nsamples=500,000 with samplefreq=100 and burnin=100,000, and summarized across all four runs for each dataset (2,000,000 samples excluding burnin). To evaluate our calibrations and the informativeness of our data, we also performed analyses without sequence data using only the prior.
Biogeographic analyses
The biogeographic history of Camponotini was inferred using BioGeoBEARS (Matzke, 2013), following the tutorials available on the BioGeoBEARS PhyloWiki (http://phylo.wikidot.com/biogeobears). We constructed a distribution matrix by scoring all taxa for six designated biogeographic areas: Neotropical, Nearctic, Palearctic, Afrotropical (including Malagasy), Indomalayan, and Australasian (after Cox 2001) (Table S5). For the separation of Indomalaya and Australasia, we referred to the Wallace line, except that Sulawesi was included in Indomalaya in our analysis which is more consistent with general ant distribution patterns. We first used the chronograms resulting from MCMCTREE analyses of the 80%-0.98-spruceup and 90%-0.98-spruceup datasets, including all ingroup taxa and the outgroup Myrmoteras iriodium, and the distribution matrix for a set of unconstrained analyses without dispersal constraints between biogeographic areas. A second set of constrained analyses was then performed, implementing dispersal constraints defined based on the level of connectivity between these biogeographic areas: 1.0 for adjacent areas connected by a landmass, 0.5 for adjacent areas separated by large water gaps, i.e. Neotropical/Afrotropical, Neotropical/Australian and Afrotropical/Australian, and 0.0001 for non-adjacent areas (Table S6). Finally, we ran both these sets of unconstrained and constrained analyses again using modified input chronograms, in which we pruned the outgroup taxon Myrmoteras iriodum. For each combination (total of eight), we tested the three main models implemented in BioGeoBEARS: the dispersal and extinction cladogenesis (DEC) model (Ree & Smith, 2008), the DIVALIKE model, a likelihood version of the Dispersal-Vicariance model (Ronquist, 1997), and the BAYAREA-LIKE model, a likelihood version of the Bayesian Analysis of Biogeography model (Landis et al., 2013). We did not incorporate the jump dispersal parameter “j” into our models due to doubts about its statistical performance (Ree & Sanmartín, 2018). We defined max_range_size = 2, as no taxon in our analyses occupies more than two areas. We summarized log-likelihoods as well as AIC and AICc scores of all models and plotted results for the model with the highest AICc score in each set of analyses.
Blochmannia phylogenetic analyses
All Blochmannia phylogenies were estimated using IQ-TREE 2.2.2.7 for Linux or IQ-TREE 2.2.2.6 for MacOS (Minh et al., 2020). For each dataset, we used the ModelFinder option (Kalyaanamoorthy et al., 2017) to determine the best-fit substitution model for the sequence format analyzed (DNA or codon sequences). Under the best-fit model, we estimated the maximum likelihood (ML) best tree and bootstrap consensus tree based on an ultrafast bootstrap approximation with 1000 replicates (Hoang et al., 2018). To analyze possible root positions or congruence with host relationships, we also performed tree topology tests within IQ-TREE. These tests compute the log-likelihoods of a dataset across the ML best tree versus the dataset when constrained to one or more alternative topologies.
创建时间:
2025-02-26



