Disentangling complex histories of hybridisation: The genomic consequences of ancient and recent introgression in Channel Island monkeyflowers
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.cfxpnvxjw
下载链接
链接失效反馈官方服务:
资源简介:
Hybridization is a common feature of evolutionary radiations, but its genomic consequences vary depending on when it occurs. Since reproductive isolation takes time to accumulate, hybridization can occur at multiple points during divergence. Previous studies suggested that the taxonomic diversity in evolutionary radiations can help infer the timing of past gene flow events. Here, we assess the power of these approaches for revealing when gene flow occurred between two monkeyflower taxa (Mimulus aurantiacus) endemic to the Channel Islands of California. Coalescent simulations reveal that conventional four-taxon tests may not be capable of fully distinguishing between recent and ancient introgression, but genome-wide patterns of phylogenetic discordance vary predictably with different histories of hybridization. Using whole-genome sequencing and phylogenetic tests for introgression across the M. aurantiacus radiation, we identify signals of both ancient and recent hybridization that occurred between the island taxa and their ancestors. In addition, we find widespread selection against introgressed ancestry, consistent with polygenic barriers to gene flow. However, we also identify localized signals across the genome that may indicate adaptive introgression. This study highlights the power and challenges of trying to disentangle complex histories of hybridization. More broadly, our results illustrate the multiple roles that gene flow can play in evolutionary radiations: hybridization can expose genetic incompatibilities that contribute to reproductive isolation while also facilitating adaptation by transferring beneficial alleles between taxa. These findings underscore the dynamic interplay between the timing of hybridization and natural selection in shaping evolutionary trajectories within radiations.
Methods
Genome sequencing and variant calling
Leaf tissue from 27 plants was collected across four locations on Santa Cruz Island, California, USA (Table S1), consisting of red-flowered parviflorus, yellow-flowered longiflorus, and their putative hybrids. Tissue was dried in silica in the field, and DNA was isolated using the Zymo Plant and Seed DNA kit following the manufacturer’s instructions. Sequencing libraries were prepared according to Gaio et al (2022), with slight modifications. Bead-linked transposase from the Illumina Nextera XT Kit was used for initial tagmentation, generating insert sizes in the range of 400 – 1200 bp. Multiplexed libraries were sequenced on the Illumina Novaseq 6000 using paired-end 150 bp reads at the University of Oregon’s Genomics Core Facility.
New sequences from Santa Cruz Island were combined with previously generated whole genome sequences from various M. aurantiacus subspecies from Stankowski et al. (2019) and Short & Streisfeld (2023), resulting in a final dataset containing 74 individuals. Raw reads were filtered using fastp to remove reads with uncalled bases or poor quality scores (Chen et al. 2018). The retained reads were then aligned to the M. aurantiacus reference assembly (Stankowski et al 2019) using BWA version 0.7.17 (Li & Durbin 2009). An average of 90.56% of reads aligned (range: 75.45% – 95.33%), and the average sequencing depth was 10× per individual (range: 6–18×). PCR duplicates were marked using Picard (https://broadinstitute.github.io/picard/). Variant calling was performed following Stankowski et al. (2019). We then phased the VCF using BEAGLE (Browning & Browning 2007), and further filtered the VCF file for biallelic SNPs using vcftools (Danecek et al. 2011). The final data set contained 12,749,566 SNPs across all 74 samples. Finally, we ran UnifiedGenotyper with the EMIT_ALL_CONFIDENT_SITES option to output all variant and invariant genotyped sites.
Admixture, PCA, and phylogenetic analysis
Prior research has consistently revealed four monophyletic clades delineating the taxa in M. aurantiacus (Stankowski and Streisfeld 2015; Chase et al 2017; Stankowski et al 2019; Short and Streisfeld 2023). Clade A consists entirely of subspecies grandiflorus; clade B shows parviflorus and aridus as sister taxa; clade C consists entirely of aurantiacus; and the remaining taxa (longiflorus, calcyinus, and the three lineages of puniceus) from southern California comprised the diverse clade D. To determine how the newly sequenced island samples were related to the other taxa in the complex, we used Admixture (Alexander et al. 2009) to estimate ancestry proportions from all samples, with the number of clusters (K) set between 2 and 11. We ran the cross-validation error function in Admixture and report results based on the value of K with the lowest cross-validation error. Because of the potential for ongoing hybridization between parviflorus and longiflorus on Santa Cruz Island (Wells 1980) and the phenotypic similarity between longiflorus from the island and mainland (Tulig 2000), we ran Admixture again using only these samples to identify putatively unadmixed individuals. Plants from the island that showed no evidence of admixture at K = 2 were used to assign individuals to subspecies. To further assess clustering patterns among these samples, we also performed a principal components analysis (PCA) in Plink using the 31 samples from the island and the 7 longiflorus from the mainland. Admixed individuals were removed from further analysis, as they likely represented contemporary hybrids, which were not appropriate to include in the phylogenetic tests we used to estimate introgression.
To determine the phylogenetic relationships of samples from the island and mainland, we generated a maximum likelihood consensus tree using IQ-TREE v1.6.12 (Nguyen et al. 2015). We used a concatenated dataset consisting of 12,749,566 biallelic SNPs that did not include island samples that showed evidence of admixture in the above analyses (resulting in 63 individuals, see Results). We used 1000 bootstrap replicates to provide support values at each node.
Tests for the timing of admixture
To test for genome wide evidence of introgression, we used dsuite (Malinsky et al. 2021) to calculate the f4-ratio (Reich et al. 2009) for all possible trios of ingroup taxa, using M. clevelandii as the outgroup. The f4-ratio is derived from the D-statistics and their relatives, which measure asymmetries in the numbers of sites with ABBA and BABA patterns (where A and B are ancestral and derived alleles, respectively) across a phylogeny with three ingroup taxa and an outgroup that have the relationship (((P1, P2), P3) O). A significant excess of either pattern gives a nonzero value of D, which is taken as evidence that gene flow has occurred between P3 and one of the sister taxa. The f4-ratio then measures the proportion of the P2 genome that is derived from P3 and thus provides an estimate of the admixture proportion (Reich et al 2009). To estimate how introgression varied across the genome, we calculated another version of the admixture proportion (known as fd; Martin et al 2015) in 100 kb non-overlapping windows using the ABBABABAwindows.py Python script (https://github. com/simonhmartin/genomics_general). Similar to the f4-ratio, fd searches for asymmetries in the number of ABBA and BABA sites across the genome, but it has been optimized specifically for use in genomic windows. For the fd analysis, mainland longiflorus was set as P1, Island longiflorus was P2, parviflorus was P3, and M. clevelandii was the outgroup (also referred to as Test 1, below).
The fbranch statistic was designed to evaluate which taxa in an evolutionary radiation showed evidence of introgression and to assess the relative timing of that gene flow (Malinsky et al 2018, 2021). Specifically, given the f4-ratios calculated for all possible trios among a set of related taxa, fbranch uses the inferred phylogenetic relationship among these taxa, as well as variation in the phylogenetic distance between the sister taxa used as P1 and P2, to assign introgression to specific branches on a phylogenetic tree. The idea behind this test is the assumption that alleles introgressed into the common ancestor of P1 and P2 should be present in roughly equal proportions in both descendent lineages, making it difficult for any phylogenetic test to distinguish shared ancestral variation from introgression (Martin et al 2013; Short and Streisfeld 2023). However, by gradually increasing the phylogenetic distance between the two sister taxa used in these tests, fbranch should be able to detect introgression that occurred at various points farther back in time (Malinsky et al. 2021). Indeed, Malinsky et al (2018) used fbranch to conclude that there was extensive ancient hybridization between the ancestors of modern cichlid lineages, as well as recent hybridization among closely related species found in similar ecological niches.
Despite the availability of this test, fbranch (or any 4-taxon test of introgression) might present challenges for distinguishing recent from ancient introgression (Fig S1). Although fbranch tests all possible trios of ingroup taxa in a phylogeny, it does not account for shared genetic variation between the P3 taxon and its relatives. For example, if P1 and P2 are more diverged from each other than the P3 taxon is to its immediate relatives, then variation that arose in the ancestor of P3 and its sister taxa would be shared in all descendant taxa. Thus, even if gene flow occurred recently between P3 and P2, we would detect a shared signal of introgression between P2 and all the relatives of P3 (Fig S1). Alternatively, the same shared pattern of introgression could be caused by ancient gene flow that occurred prior to the divergence of these sister taxa. Consequently, the same signals of introgression obtained from 4-taxon tests can arise from distinct evolutionary histories, making it potentially difficult to determine whether introgression was recent or ancient.
To test the ability of fbranch to distinguish between recent and ancient introgression, we performed neutral, coalescent simulations in msprime v. 1.2.0 (Baumdicker et al. 2022). Specific details of each simulation can be found in the supplemental methods, but we began by testing three divergence times and three migration rates across three different demographic scenarios (Fig S2). We evaluated these scenarios using a common phylogenetic history that matched the evolutionary relationships found in the M. aurantiacus species complex (see Results). The specific scenarios included: i) ‘recent introgression,’ where gene flow occurred between parviflorus and Island longiflorus after the split between mainland and Island longiflorus; ii) ‘ancient introgression,’ where gene flow occurred between the ancestor of parviflorus and the ancestor of clades C and D; and iii) ‘recent and ancient introgression,’ which combined scenarios i and ii (Fig S2). fbranch was run separately for all divergence time and simulation scenario combinations, and then the different migration rates were tested for each demographic model using a single divergence time (see supplemental methods). Simulated results were compared to results from fbranch calculated from the empirical sequence data. In addition, we ran a fourth demographic scenario where no introgression occurred (to account for the effects of incomplete lineage sorting), which was used in the following section.
The relationship between genome wide phylogenetic discordance and introgression
To further describe the history of introgression with parviflorus, we explored patterns of phylogenetic discordance across the genome using TWISST (Martin & Van Belleghem 2017). Given a set of samples from multiple ingroup taxa, TWISST builds trees in genomic windows and then calculates the support for that topology among all possible topologies, which is referred to as the topology weighting. If all samples from each taxon are reciprocally monophyletic, then that topology is given a weighting of 1.0 for that window. Topological discordance among samples within a taxon will result in lower topology weightings for that window. We used TWISST to identify variation in the relationships among aridus, parviflorus, Island longiflorus, and mainland longiflorus in 100 kb genomic windows, with M. clevelandii as the outgroup. With four ingroup taxa, this resulted in 15 possible topologies, which were then partitioned into those that supported: a) the species tree, where aridus and parviflorus were sister and reciprocally monophyletic relative to Island longiflorus and mainland longiflorus; b) a set of recent introgression trees, where parviflorus and Island longiflorus were sister to one another relative to aridus and mainland longiflorus; and c) an ancient introgression tree, where parviflorus was sister to mainland and Island longiflorus relative to aridus. By quantifying variation in the support for these topologies across the genome, we should be able to identify how often ancient and recent introgression contributed to phylogenetic discordance. This is because adding a fourth ingroup taxon that is sister to P3 should allow us to distinguish the sharing of ancestral variation from signals caused by the introgression of lineage-specific alleles.
However, topological discordance can also be influenced by incomplete lineage sorting (ILS). To test if these discordant topologies could be caused by ILS, we performed ten iterations of each of the simulated demographic scenarios described above, including the scenario with no introgression. We ran TWISST on each simulated dataset to count the number of windows that showed complete support for the recent or ancient introgression topologies (i.e., a topology weighting of 1.0). Under incomplete lineage sorting alone, we would expect to find support for both the recent and ancient introgression topologies in the simulation where no gene flow was allowed. To further confirm that these topology weightings were impacted by introgression, we calculated FST between parviflorus and Island longiflorus in all 100 kb non-overlapping windows that showed complete support for the species tree, the recent introgression tree, or the ancient introgression tree. Windows that matched the species tree or the ancient introgression tree are expected to show greater divergence between parviflorus and Island longiflorus than genomic regions reflecting recent introgression. This is because genomic windows where there has been recent introgression should have more allele sharing (and thus lower FST) between the pair of hybridizing taxa than genomic windows supporting the species tree (where there is no history of gene flow) or ancient introgression (where the signal may have been lost over time due to either selection or drift). FST was calculated using the popgenwindows.py Python script (https://github.com/simonhmartin/genomics_general), with only variant sites included. Finally, to distinguish between recent and ancient introgression, we compared the empirical distribution of tree topologies to the four simulated scenarios to determine which demographic history most closely resembled our observed results.
The genomic consequences of introgression
Introgression can have multiple consequences on patterns of genetic variation across the genome. For example, under a model where selection against gene flow is polygenic, we expect a positive relationship between introgression and recombination rate and a negative relationship between introgression and genetic differentiation (Brandvain et al. 2014, Schumer et al. 2018, Liu et al. 2022). To estimate the relationship between introgression and recombination rate, we partitioned the fd values calculated in 100 kb windows into quantile bins of recombination rates that were calculated previously in 500 kb windows by Stankowski et al. (2019). We then calculated the mean and 95% confidence intervals for fd within each recombination rate quantile bin and fit a linear model, with fd as the dependent variable and the recombination rate quantile bins as the independent variable. We then used the emmeans package (Lenth, 2019) to perform pairwise comparisons of the estimated marginal mean fd values for the different quantile bins from the linear model. We used Spearman’s correlation coefficient to assess the relationship between fd and FST, calculated in the same 100 kb windows.
In addition, we determined whether genetic diversity among the hybridizing island taxa varied relative to the mainland taxa. Specifically, we asked whether the nucleotide diversity (π) and effective population size in both island subspecies was similar to the other taxa in the complex. We calculated p in 100 kb windows using PIXY version 1.2.5 (Korunes & Samuk, 2021), with both variant and invariant sites included. To identify differences in the effective population sizes of these taxa, we used PSMC (Li and Durbin 2011) to estimate the effective population size through time for all samples from the island taxa and their most closely related mainland relatives. In accordance with Stankowski et al. (2023), we assumed a generation time of 2 years and a mutation rate of 7x10-9 for these calculations.
Identifying signatures of adaptive introgression
Extensive heterogeneity in admixture across the genome raises the possibility that adaptive introgression has maintained foreign ancestry in particular regions. To determine whether regions of elevated fd display signatures of adaptive introgression between parviflorus and Island longiflorus, we calculated the frequency of Q95 sites found in windows across the genome (Racimo et al. 2017, Feng et al. 2024). Q95 sites are defined as those that are fixed (at a frequency of 1.0) in the donor taxon, near fixation (at a frequency greater than the 95th percentile) in the recipient taxon, and absent in the taxon that is sister to the recipient. Thus, any derived alleles that are present at high frequency in a recipient taxon that are also absent in its sister taxon but are fixed in a more distantly related donor taxon are candidates for adaptive introgression. Although other possibilities might generate an increase in the frequency of these Q95 sites (i.e., drift and bottlenecks), the Q95 statistic has been shown to be a powerful and reliable approach for detecting adaptive introgression (Racimo et al 2017). Moreover, genomic regions that display an excess of Q95 sites are even more likely to have been targets of positive selection, because selection will have correlated effects on linked sites. By contrast, other tests that rely on polymorphism or linkage disequilibrium are often less effective at detecting positive selection in introgressed regions due to known modifications of the haplotype structure and patterns of genetic diversity (Suarez-Gonzalez et al 2018; Moest et al 2020; Setter et al. 2020; Feng et al 2024), Thus, by identifying an elevated frequency of Q95 sites in 100 kb windows across the genome, we can identify genomic regions that may have been adaptively introgressed.
To identify Q95 sites, we first identified all derived SNPs that were polymorphic among samples. Derived SNPs were defined as those where one nucleotide was fixed in M. clevelandii but a different nucleotide was present in at least one chromosome from samples of mainland longiflorus, Island longiflorus, parviflorus, or aridus. We then determined the number of derived sites that were fixed in parviflorus, absent in mainland longiflorus, and present in at least one haplotype in Island longiflorus. These represented sites that were introgressed from parviflorus into Island longiflorus. The 95th percentile of the allele frequency distribution among introgressed sites in Island longiflorus (=0.875) was used as a cutoff to identify adaptively introgressed sites (i.e., Q95 sites) (Racimo et al. 2017, Feng et al. 2024). For each 100 kb window, we then divided the number of Q95 sites by the total number of derived SNPs that were found on at least one chromosome in Island longiflorus to estimate the frequency of Q95 sites in each window.
To identify signatures of adaptive introgression that were transferred from Island longiflorus into parviflorus, we identified derived SNPs that were fixed in Island longiflorus, present on at least one haplotype in parviflorus, but were absent in aridus. In this case, the 95th percentile of the allele frequency distribution in parviflorus included only sites fixed in parviflorus. We then divided the number of Q95 sites by the total number of derived SNPs that were present on at least one chromosome in parviflorus in 100 kb windows.
To further explore signatures of introgression for the region with the highest frequency of Q95 sites across the genome, we calculated fd, dxy, FST, π, and the frequency of Q95 sites in 10 kb windows with 1000 bp steps. We calculated p and dxy using PIXY version 1.2.5 (Korunes & Samuk, 2021), with both variant and invariant sites included, and FST was calculated using the popgenwindows.py Python script described above. Introgression will result in reduced dxy and FST between a pair of hybridizing taxa, because the taxa share alleles in this region. By contrast, there should be a corresponding increase in genetic divergence between the recipient and its sister taxa.
创建时间:
2025-06-25



