five

Diversification and evolution of Hawaiian Megalagrion damselflies (Pinapinao, Odonata: Coenagrionidae)

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.18931zd80
下载链接
链接失效反馈
官方服务:
资源简介:
Hawaiʻi’s pinapinao (Megalagrion McLachlan) comprises a radiation of 23 endemic damselfly species within Coenagrionidae. Despite being a unique study system for understanding geology’s impacts on evolutionary processes among Odonata, the understanding of these damselflies’ temporal, geographic, and phylogenetic origins remains incomplete. Testing macroevolutionary hypotheses has been hampered by conflicting topologies. To resolve these uncertainties, we performed phylogenetic analyses including divergence time estimation with 90 nuclear loci (>50 kbp) and 2 mitochondrial loci (>1 kbp), sampling representatives from 37 genera within core Coenagrionidae and 90% of Megalagrion species, including multiple island populations. We used ancestral range estimations, diversification analyses, agent-based simulation modeling, and ancestral state reconstruction to infer the group’s origin and biogeography and assess traits’ roles in diversification. Our findings indicate Megalagrion’s ancestor diverged from core Coenagrionidae early Eocene (~51 MA) and diversified early Miocene (~19 MA), suggesting Megalagrion’s MRCA predates Kauaʻi’s emergence by 7–21 MY. Diversification analyses suggest a low rate after Megalagrion diverged from Coenagrionidae, followed by a sudden increase around 19 MA, and simulation modeling supports extinction playing a significant role. Extant Megalagrion diversity is largely explained by ecological diversification into at least five clades with distinct breeding habitats that likely evolved on the Northwestern Hawaiian Islands, which are now sunken seamounts. Speciation continued as descendants dispersed to the current Hawaiian Islands as islands emerged. Species breeding in seeps further diversified within the island of Kauaʻi. Our results highlight including geologic changes over time in evolutionary studies and increase understanding of diversification patterns, biogeography, and adaptive radiation on islands. This dataset contains scripts, data, and files used to perform the study described above and published in the article "Diversification and Evolution of Hawaiian Megalagrion Damselflies (Pinapinao, Odonata: Coenagrionidae)" in Systematic Entomology. Methods Taxon Sampling We gathered a total of 56 specimens from all 21 extant species of Megalagrion, plus one undescribed species (Table 1). Specimens were sourced from the personal collection of Steve Jordan, the Florida State Collection of Arthropods, Brigham Young University, and fieldwork following local and state permitting requirements in Hawaiʻi. In addition to having complete sampling across the genus, we wanted to have representatives of each taxon from the different habitats in which they are found. With few exceptions, we obtained one representative of each species of Megalagrion from every major volcano, including multiple volcanoes on the same island, except the island of Hawaiʻi, where we obtained at least one specimen of each species on the island. We acknowledge the subspecies division between M. nigrohamatum nigrohamatum (Blackburn) and M. nigrohamatum nigrolineatum (Perkins) and include specimens from each subspecies. We did not obtain specimens from the two extinct species (M. jugorum (Perkins) and M. molokaiense (Perkins)) or from several extirpated populations of extant species. Historical dry specimens from both extinct species and several of these extirpated populations are present at the Bishop Museum in Honolulu, Hawaiʻi. However, DNA was not extracted from these specimens due to their taxonomic importance as irreplaceable types. In addition to our sampling of Megalagrion, we sampled representatives from 37 of 58 valid genera across core Coenagrionidae (sensu Dijkstra et al., 2014), which included all genera previously hypothesized to be closely related to Megalagrion (Table S1) (Dumont et al., 2010; E. W. O’Grady & May, 2003). We attempted to sample all genera that may be closely related to Megalagrion. Of the 21 genera from core Coenagrionidae not included in our sampling scheme, 10 are monotypic, and the other 11 genera account for a total of 55 species. Our purpose in including this level of outgroup sampling was to determine the placement of Megalagrion within Coenagrionidae, and to help estimate molecular divergence times and ancient biogeography. For diversification analyses with RPANDA, an extended taxon sampling set was used with additional species sampled from genera across core Coenagrionidae, totaling 411 samples (Table S2). DNA Extraction & Sequencing For each specimen, thoracic muscle tissue was extracted when possible; otherwise, a single leg was used. Specimens were vouchered at the Bean Life Science Museum, Brigham Young University in Provo, Utah. DNA was extracted using the DNeasy Blood and Tissue Kit (Qiagen, Cat. No. 69504). For at least one specimen from each genus, sequences of 1,306 nuclear loci were obtained through anchored hybrid enrichment (AHE) sequencing using a probe set designed to capture single-copy orthologs of exons across odonate genomes (Bybee et al., 2021). For the remaining samples of Megalagrion, we captured a reduced set of 90 nuclear loci (>50 kbp) as well as 2 mitochondrial loci (~1,000 bp). AHE sequencing was performed by Rapid Genomics (Gainesville, FL, USA) following their standard protocols. This genome-wide approach with multiple loci helps capture phylogenetic signals that represent the evolution of species as a whole rather than specific genes (Degnan & Rosenberg, 2006). Sequence Processing & Quality Control Anchored hybrid enrichment data were processed with a pipeline modeled after Breinholt et al. (2018). Raw sequence reads were cleaned and adapters were removed using fastp v0.23.2 (Chen et al., 2018). The 90 nuclear and 2 mitochondrial loci were assembled following Goodman (2023) using iterative baited assembly following Breinholt et al. (2018) with SPAdes genome assembler v3.15.4 (Bankevich et al., 2012) and the Tanypteryx hageni (Selys) reference genome (Tolman et al., 2023). A more closely related zygopteran or coenagrionid genome may have improved locus recovery and alignment accuracy. However, the anisopteran Tanypteryx hageni genome was used because probes were designed to be single copy with this genome and to be compatible with extended datasets across Odonata for further studies with broader taxon sampling. Sequences from the expanded probe set with 1,306 nuclear loci were filtered to include only the 90 nuclear loci also present in the smaller probe set. Contamination filtering was performed by excluding identical sequences from different genera using the -ublast function of USEARCH v11.0.667 (Edgar, 2010). For each locus, only one ortholog was chosen for each taxon based on bit score, length, and coverage. We excluded taxa for which fewer than 30% of the 92 loci (< 27 loci) were recovered. In our main sampling scheme, 1 of 104 samples (<1%) recovered fewer than 27 loci. In the extended sampling scheme used for RPANDA analyses, 28 of 411 samples (~7%) had fewer than 27 of the 92 loci. Some samples from both sampling schemes were sequenced with the expanded probe set of 1,306 nuclear loci, all of which were filtered to include only the 90 nuclear loci present in the smaller probe set. Anchored hybrid enrichment sequencing recovered not only the targeted loci, but also some flanking regions on either side of each locus. To align sequences for each locus, we used the L-INS-I algorithm from MAFFT v7.520 (Katoh & Standley, 2013). To ensure that the sequences aligned properly, we aligned the target region first, including the sequence used as a reference during assembly. The reference sequence was included to identify the position of the target region of the alignment. Then, we aligned the full-length sequences, including the flanking regions, to the alignment containing only the target region using the --addlong function of MAFFT. The target-region sequences (except for the target-region reference sequence) were then removed from the alignment, leaving an alignment of the full-length sequences, plus the target-region reference sequence. Aligned sequences were filtered using Aliscore 22.ii.2012 (Kück et al., 2010; Misof & Misof, 2009) and AliCUT v2.31 (Kück, 2009), which aim to remove alignment, ambiguous, or randomly similar sites. We trimmed the ends of alignments until there was at least 50% taxon coverage. This trimmed alignment was then split into three separate alignments, one for the target region (based on the position of the target-region reference sequence), and two for the flanking regions on each side. The two alignments for the flanking regions on either side of the target region were then concatenated. Loci with less than 50% taxon coverage were removed from further analysis. This process yielded two sets of alignments: one including only the target region, the other including only the flanking regions. The average pairwise distance between sequences in a concatenation of the flanking region alignments was more than three times the average pairwise distance between sequences in a concatenation of the target region alignments, indicating that the target and flanking regions have significantly different rates of evolution. Therefore, we chose to partition the target and flanking regions separately in phylogenetic analyses. Phylogenetic Analysis We performed our main phylogenetic analyses on two data sets. The first included a full alignment (both the target and flanking regions), and the second included a target-region-only alignment. We also performed phylogenetic analyses with the extended taxon sampling scheme and target-region-only alignment for RPANDA following the same procedure. For each analysis, we performed a partitioned maximum likelihood phylogenetic inference using IQ-TREE v2.2.0 (Minh et al., 2020). Our preliminary full partition scheme contained 92 or 184 partitions reflecting our alignments [92 loci × (1 target) or 92 loci × (1 target + 1 flanking)]. We used ModelFinder Plus (Kalyaanamoorthy et al., 2017) to select the optimal partition scheme by merging partitions until model fit ceased to improve, and to find the best-fitting model of evolution for each partition according to the Bayesian Information Criterion (-m MFP+MERGE option). We employed the identified partition scheme and evolutionary models to infer a maximum likelihood tree. To assess branch support, we conducted 1,000 ultrafast bootstrap replicates with nearest neighbor interchange optimization (-bnni 1000 option). Coalescent Species Tree Analysis We inferred species relationships using a multispecies coalescent approach implemented in ASTRAL (Mirarab & Warnow, 2015; Zhang et al., 2018) via the ASTER package v1.16.3.4 (Zhang et al., 2023). Individual gene trees were first reconstructed for each of the 92 loci from the full alignments using IQ-TREE with ModelFinder Plus to select the best-fitting substitution model for each locus. ASTRAL estimates the species tree by finding the tree that maximizes the quartet score given the input gene trees, accounting for incomplete lineage sorting under the multispecies coalescent model. Branch support was assessed using local posterior probabilities, which represent the probability of a quartet given the gene trees. To quantify gene tree discordance, we calculated Robinson-Foulds (RF) distances between each gene tree and the species tree using the phangorn package v2.11.1 (Schliep, K. P., 2011) in R. To visualize gene tree discordance in Megalagrion, we pruned trees to include only Megalagrion, then used the DensiTree function from the phangorn package to display all gene trees simultaneously as a cloudogram and compared them to the species tree generated by ASTRAL. Divergence Time Estimation To generate a starting tree for Bayesian divergence time estimation in BEAST, we used our maximum likelihood tree generated from the target-region-only alignment. From this tree, we reconstructed a maximum likelihood fossil calibrated tree using the chronos function from ape v5.7.1 in R (Paradis et al., 2004). We used four fossils to calibrate the tree (Agriocnemis lacteola Selys, Enallagma florissantella Cockerell, Diceratobasis worki Poinar, and Ischnura velteni Bechly, Table 2). These fossils were diagnosed using apomorphies in their respective descriptions, and none of them have been used as taxa in phylogenetic datasets previously, though I. velteni has been used for divergence time estimation (Blow et al., 2021). Each was placed at the node of the most recent common ancestor of their respective genera. Because we only had one sample from Diceratobasis Kennedy, its most recent common ancestor node was the node connecting Diceratobasis and its sister. We constrained the age of the root using a wide range around previous estimates of the age of the core Coenagrionidae (70 ± 20 MA) (Suvorov et al., 2021). We used a strict clock substitution model (discrete substitution model with lambda smoothing parameter = 1). We used the preliminary fossil-calibration maximum likelihood tree from above as a starting tree for an unpartitioned Bayesian divergence time estimation using BEAST v2.7.6 (Bouckaert et al., 2014, 2019; Drummond & Rambaut, 2007). We used the maximum likelihood tree generated from an unpartitioned full alignment including both the target and flanking regions to constrain the topology using a multiple monophyletic constraint prior. We performed analyses with both fossil and geographic calibration (see below) priors, and with only fossil calibration priors (Table 3). For both analyses, we used the birth-death diversification model, an uncorrelated relaxed clock substitution model, and a GTR+G+I model of evolution with all parameters estimated. We used a root age prior with a normal distribution around 70 MA with a standard deviation of 10 MA based on previous estimates of the age of the core Coenagrionidae (Suvorov et al., 2021). We used four fossil calibration points with log-normal prior distributions (Table 2, same as fossil calibrations for preliminary calibration). Log-normal fossil calibration prior distributions were offset so that the minimum was equal to the minimum age of the fossil, and μ was adjusted until the 95% quantile was near the previously estimated crown age for core Coenagrionidae (Suvorov et al., 2021). We ran the MCMC chain for at least 10,000,000 generations, and saved a tree every 5,000 generations. For analyses with a geographic calibration, seven points were placed within Megalagrion clades with overwhelming phylogenetic evidence of following the progression rule, similar to previous studies (Haines et al., 2014; Johns et al., 2018). The emergence times of Oʻahu and Maui Nui were used for these calibration points. Within each clade, calibration points were placed at the most recent common ancestor of the Maui Nui/Hawaiʻi clade based on the emergence of Maui Nui and the most recent common ancestor of the Oʻahu + Maui Nui/Hawaiʻi clade based on the emergence of Oʻahu. The emergence time of Kauaʻi was not used for any geographic calibration points because taxa from Kauaʻi may have had ancestral ranges that once included islands that are now submerged. Geographic calibrations had a uniform prior distribution with a maximum equal to the emergence time of the island and a minimum of zero. Our geographic calibration strategy assumes that if a clade endemic to a younger range has a sister endemic to the next-oldest island, then the clade with the younger range does not predate the formation of its current range. Clades that do not have an extant sister on Kauaʻi are not included in this assumption, allowing for the possibility that the clade could have originated on Kauaʻi or older islands that are now submerged. For example, there is the possibility that the closely related terrestrially-breeding species M. oahuense and M. nesiotes have an extinct sister that once inhabited Kauaʻi or an older island; therefore, this clade is not used to aid in geographic calibration. The MCMC results were analyzed for convergence and to determine the appropriate amount of burn-in using Tracer v1.7.2 (Rambaut et al., 2018). Maximum clade credibility trees were generated from the posterior trees using TreeAnnotator v2.7.6 with 20% burn-in. Node heights were set to common ancestor heights. To validate our calibration strategy and address potential concerns about interactions between multiple node age constraints, we performed prior-only analyses sampling from calibration priors without molecular data. We analyzed all 16 key parameters, including 12 calibration nodes (5 fossil-based, 7 geographic-based) and 4 evolutionary model parameters (birth rate, death rate, clock rate, tree height). Ancestral Range Estimation We performed ancestral range estimation using BioGeoBEARS v1.1.3 in R (Massana et al., 2015; N. Matzke, 2013; N. J. Matzke, 2013, 2014; Van Dam & Matzke, 2016). We use the dated tree from our BEAST analysis with both fossil and geographic calibration priors. Previous biogeographic analyses of Coenagrionidae with a higher-level taxonomic sampling, including Megalagrion and various Ischnurines, were not able to determine the geographic origin of Megalagrion. This is likely because Megalagrion is sister to the globally distributed diverse clade Ischnurinae, making it difficult to determine the origin of Megalagrion using only phylogenetic relationships and distribution data. We chose to focus our analysis on biogeographic patterns within Megalagrion rather than repeat an attempt to determine the origin of Megalagrion. We pruned the tree to include only Megalagrion. Operational taxonomic units in BioGeoBEARS analyses should be phylogenetically distinct, genetically isolated populations; therefore, we chose to preserve island-endemic populations of species present on multiple islands as separate operational taxonomic units. We collapsed multiple representatives of monophyletic island-endemic populations. We coded islands into six areas (Fig. 1). The first area (K) included Kauaʻi, Niʻihau, and any unknown ancestral range, including islands that have since submerged. The second area (O) included Oʻahu. The third area (M) included Maui. The fourth area (L) included Lānaʻi. The fifth area (X) included Molokaʻi. The sixth area (H) included the island of Hawaiʻi. Kahoʻolawe was not included in the analysis because no Megalagrion damselflies have ever been recorded from the island. Area ‘K’ includes islands that have since submerged because some nodes in the tree are older than any of the existing main Hawaiian islands (S. Jordan et al., 2003). The islands of Maui Nui have been connected during the last glacial maximum 20 – 21 KA (Price & Elliott-Fisk, 2004). Oʻahu and Maui Nui have also been connected through Penguin Bank during periods of low sea level. Gene flow between Megalagrion populations on different islands of Maui Nui has been shown to be greater than between populations on Maui Nui and populations on the island of Hawaiʻi (Jones & Jordan, 2015; S. Jordan et al., 2005). We chose to code the islands of Maui Nui as separate areas because populations on different islands are still often phylogenetically distinct and have significant genetic differences, and marine channels represent a strong barrier to gene flow. By coding these islands separately, we also aimed to understand patterns of dispersal between the islands of Maui Nui and Oʻahu at a higher resolution. We performed a time-stratified analysis, where areas and ranges were not allowed if none of the islands from that area had yet emerged (Table S3). The earliest estimated emergence times of islands were used to define available areas (D. Clague, 1996). Thus, area K was always available since it included submerged islands and Kauaʻi, which still exists. Area O was only available after the emergence of Oʻahu (3.7 MA). Areas M, L, and X were only available after the emergence of the earliest land of Maui Nui (Penguin Bank, 2.2 MA). Area H was only available after the emergence of the first volcano of the island of Hawaiʻi (Kohala, 0.6 MA). To estimate the distance between islands, we drew contours at sea level around each area using the 2023 GEBCO Grid and terra v1.7.71 in R (GEBCO Compilation Group, 2023; Hijmans, 2024). We used the “distance” function from the terra R package to calculate the minimum distance between areas. These distances were used to create a distance matrix for DEC models with distance-dependent dispersal (+x). Dispersal multipliers were not applied, and the null range was not included (Massana et al., 2015). Twelve separate models were run in BioGeoBEARS. These 12 models included the DEC, DIVALIKE, and BAYAREALIKE base models, base models with jump dispersal (+J), base models with distance-dependent dispersal (+x), and base models with both jump dispersal and distance-dependent dispersal (+J+x). All 12 models were compared using the Akaike information criterion (AIC) (Table S3), and the model with the lowest AIC value was selected as our final result. We included the +J model in our analysis because jump dispersal leading to founder event speciation is expected to have occurred in Megalagrion as individuals colonize new islands. Diversification and Extinction RPANDA We performed a diversification analysis using RPANDA v2.3 (Morlon et al., 2016) in R following a recent methodological development (Mazet et al., 2023). We used a maximum-likelihood tree with expanded taxon sampling, which included approximately half of the described species for the included genera. This tree was generated using a full-length alignment including both target and flanking regions. We used ape v5.7.1 in R to collapse monophyletic species with multiple samples. We dated the tree using TreePL v1.0 (Smith & O’Meara 2012) with penalized likelihood. We applied the same four fossil calibration constraints as in our Bayesian divergence time estimation (Table 2). We ran TreePL first with the “prime” option to identify optimal smoothing and rate parameters, then ran TreePL with these parameters and the “thorough” option to ensure convergence. TreePL provides a computationally efficient alternative to Bayesian methods like BEAST, making it well-suited for this dataset with extended taxon sampling. We listed all species of core Coenagrionidae according to the World Odonata List (Paulson et al., 2024). Sampling fractions were generated using RPANDA. A backbone tree was chosen to include 10 clades of interest across the tree, all containing at least 10 species (Table 4). Note that names for these clades are not meant to be valid taxonomic suggestions but are purely for convenience. All possible shift combinations were tested using five diversification models (BCST, BVAR, BCST_DCST, BVAR_DCST, BCST_DVAR). We did not test the BVAR_DVAR model, which allows both birth and death rates to vary. The BVAR_DVAR model has been criticized for being problematic and producing unrealistic estimates, and it is not currently recommended for use with RPANDA (Burin et al., 2019; Mazet et al., 2023). We tested backbone.option with “crown.shift” and “stem.shift” and set multi.backbone to “all”. Model selection used AICc values, with the combination yielding the lowest AICc across all models considered optimal. We ran the shift.estimates function with multi.backbone = “all” to explore parameter space. For the clades with estimated shifts, we calculated the difference between their diversification parameter values from the backbone (family-wide) diversification parameters to identify the magnitude and direction of rate changes in each shifted clade. Agent-based simulation modeling Diversification analyses such as BAMM and RPANDA are useful tools for identifying potential patterns of speciation and extinction rates across clades and over time. However, neither of these analyses incorporates the dynamic geology of the Hawaiian Islands, which is an important driver of diversification patterns in Megalagrion, and has likely been a driver of diversification in Megalagrion for millions of years prior to the emergence of Kauaʻi. Although we have no extant populations from these sunken islands, we simulated an island system like the Hawaiian Islands and allowed a simulated lineage to evolve in this island system for tens of millions of years. Using this simulation model, we explored potential diversification patterns in a lineage like Megalagrion, and further tested our hypothesis that extinction played a significant role in the evolution of Megalagrion due to geologic change over time. We created an agent-based simulation model in NetLogo v6.4.0 (Wilensky, 1999) following standard principles for designing agent-based models (ABMs) (Grimm & Railsback, 2012; Railsback & Grimm, 2012). The model is described following the Overview, Design concepts and Details (ODD) protocol (Supplementary methods) (Grimm, 2020; Grimm et al., 2006, 2010). The model is a generalized simulation of the evolution of a single lineage across an oceanic hotspot island system at a large time scale (10s of millions of years). The model can be divided into a geologic submodel and an evolutionary submodel. The geologic submodel models the creation, movement, and disappearance of islands in an oceanic-hotspot system. The simulated island system is modeled after patterns seen in the Hawaiian Islands, but is stochastic and generalized to simulate any oceanic hotspot system by adjusting geologic parameters. The evolutionary submodel models the evolution of a single lineage of organisms. This simulated lineage is modeled after Megalagrion, but is highly generalized and could be used to represent any lineage of organisms by adjusting biological and evolutionary parameters. Most biological and evolutionary parameters included in the model are not empirically known for Megalagrion, and/or do not correspond to easily measurable biological variables. Geologic, biological, and evolutionary parameters of the model are set manually and held constant throughout each run. However, variables such as island size, population size, number of species, extinction events, island age, endemism, the preferred niche of a species, genetic diversity, and others are not set, but are emergent from the output of the model. To validate the geologic submodel, geologic parameters were set to values similar to those empirically found in the Hawaiian Islands. We ran 54 simulations of the model, each for 20 million years. For each run, we recorded the average length of the island chain, the average island size, the average island age, and the average number of islands throughout the simulation. The average and standard deviation of these data across all simulations were used to compare simulated island chains to the Hawaiian Islands. At the end of each of these 54 simulations, we recorded the total number of species ever created and the number of extant species. The average and standard deviation of these data across all simulations were used to calculate the average speciation rate, extinction rate, and net diversification rate across all simulations. Ancestral State Reconstruction We performed stochastic character mapping on our maximum likelihood tree generated from a full alignment including both target and flanking regions using the make.simmap function of the R package phytools v2.1.1 (Revell, 2024) with the empirical Bayes method. Prior to ancestral state reconstruction, we selected the best evolutionary transition rate model by fitting three models to the data using the fitMk function (equal rates, symmetric rates, and all rates different) and evaluating model support based on AIC scores. We mapped breeding habitat and gill type separately. Characters were coded based on previous literature (Table 5) (S. Jordan et al., 2003; Polhemus, 1997; Polhemus & Asquith, 1996). Gills were coded as being foliate, lanceolate, or saccate. Breeding habitats were coded as lentic, lotic, seep, phytotelmata, or terrestrial. The breeding habitat and gill type of M. sp. (Jordan, et al. 2003) and M. mauka Daigle are not known. Therefore, two separate analyses were performed. For the first analysis, the character states of these species were given a flat prior distribution. The character states for these species were then estimated in the same way as character states were estimated for internal nodes. For the second analysis, these taxa were removed. We performed 10,000 simulations, then summarized all simulations by measuring the proportion of simulations that recovered each state at each node.
创建时间:
2025-10-07
二维码
社区交流群
二维码
科研交流群
商业服务