five

Limited migration from physiological refugia constrains the rescue of native gastropods facing an invasive predator

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.rxwdbrvjq
下载链接
链接失效反馈
官方服务:
资源简介:
Biological invasions have caused the loss of freshwater biodiversity worldwide. The interplay between adaptive responses and demographic characteristics of populations impacted by invasions is expected to be important for their resilience, but the interaction between these factors is poorly understood. The freshwater gastropod Amnicola limosus is native to the Upper St. Lawrence River and distributed along a water calcium concentration gradient within which high-calcium habitats are impacted by an invasive predator fish (Neogobius melanostomus, round goby), whereas low-calcium habitats provide refuges for the gastropods from the invasive predator. Our objectives were to 1) test for adaptation of A. limosus to the invasive predator and the low calcium habitats, and 2) investigate if migrant gastropods could move from refuge populations to declining invaded populations (i.e., demographic rescue), which could also help maintain genetic diversity through gene flow (i.e., genetic rescue). We conducted a laboratory reciprocal transplant of wild F0 A. limosus sourced from the two habitat types (high calcium/invaded and low calcium/refuge) to measure adult survival and fecundity in home and transplant treatments of water calcium concentration (low/high) and round goby cue (present/absent). We then applied pooled whole-genome sequencing of twelve gastropod populations from across the calcium/invasion gradient. We identified patterns of life-history traits and genetic differentiation across the habitats that are consistent with local adaptation to low calcium concentrations in refuge populations and to round goby predation in invaded populations. We also detected restricted gene flow from the low-calcium refugia towards high-calcium invaded populations, implying that the potential for demographic and genetic rescue is limited by natural dispersal. Our study highlights the importance of considering the potentially conflicting effects of local adaptation and gene flow for the resilience of populations coping with invasive predators. Methods 1.     Study system Gastropods have been widely used to study adaptation in response to predation (Brookes & Rochette, 2007; Hooks & Padilla, 2021), with abiotic factors such as calcium concentration modulating this response through changes in shell morphology and behavior (Rundle et al., 2004; Bukowski & Auld, 2014). As such, they are a useful biological study model for addressing evolutionary responses to biological invasions. Amnicola limosus is a small dominant freshwater gastropod species with a wide geographical distribution in the USA and Canada (www.gbif.org/species/5192461). This gastropod does not have a pelagic larval phase: egg masses are deposited on the substrate, and juveniles move from the substrate to the macro-algal substrate (Pinel-Alloul & Magnin, 1973). Part of the range of A. limosus has been invaded by Neogobius melanostomus (common name: round goby), a molluscivorous fish, from the lower Great Lakes and running downstream throughout the Upper St. Lawrence River (Hickey & Fowlie, 2005). Amnicola limosus is commonly found in the stomach contents of round gobies, and following the round goby invasion of Lake Saint-Louis, A. limosus populations experienced a 0.5-1 mm reduction in shell size (Kipp et al., 2012). Because the mean gape size of the round goby is larger than the maximum size of A. limosus, round gobies do not have to crush the gastropod, which suggests that shell size reduction is likely to be due to reduced predation pressure on smaller and less visible individuals (round gobies are visual predators; Kipp et al., 2012). A considerable reduction in small gastropod abundance (down to 2-5% of the original population size, with A. limosus being the most abundant species) and species richness in the Upper St. Lawrence River were also reported since the invasion of round gobies in this ecosystem in 2005 (Kipp et al., 2012). However, round gobies cannot tolerate low calcium concentrations (Baldwin et al., 2012; Iacarella & Ricciardi, 2015), and have not invaded the Ottawa River at its junction with the Upper St. Lawrence River (Calcium concentrations below 22 mg/L; Sanderson et al., 2021; Morissette et al., 2024). On the contrary, this low calcium concentration is not a physiological limit for A. limosus embryonic development (> 1.1 mg/L; Shaw & Mackie, 1990), and Pinel-Alloul & Magnin (1973) showed that A. limosus was present in the Ottawa river before the invasion of round gobies, indicating that this species can tolerate the calcium concentration found in the Ottawa river. These calcium-poor waters are thus acting as a refuge from round goby predation in this system (Astorg et al., 2021; Morissette et al., 2024). Calcium-poor waters could potentially provide demographic subsidies for the native populations at invaded sites (e.g., amphipods; Derry et al., 2013). 2.     Study sites, sample, and environmental data collection Twelve study sites were located at the junction of the Ottawa River and the St. Lawrence River near Montreal, QC, Canada (Fig. 1A). Ottawa River water is calcium-poor (10-15 mg/L calcium), and St. Lawrence River water is comparatively calcium-rich (30-40 mg/L) due to the different geological characteristics of their watersheds. These water masses join at the junction of two major river systems at Lake Saint Louis, a widening of the St. Lawrence River, but the calcium gradient persists between the north and south shores, and water masses are distinct (Vis et al., 1998). In 2005, round gobies invaded the upper St. Lawrence River and the southern shore of Lake Saint-Louis, but not the calcium-poor Ottawa River nor calcium-poor sites on the north shore of Lake Saint-Louis (Kipp & Ricciardi, 2012). We sampled twelve Amnicola limosus populations from the study sites in this fluvial ecosystem (Fig. 1A), with three populations fully in the Ottawa River, three fully in the St. Lawrence River, and six populations in the Lake St-Louis, including three on the north shore and three on the south shore. The gastropod species identification was verified in the lab. We confirmed that our sampling locations corresponded to distinct population units with our population structure analyses (Fig. 1B, 1C and S1). Two populations in close proximity (2.3 km; BEA and GOY) showed low levels of differentiation (FST = 0.005), but the scaled covariance matrix Ω indicated that these two populations were indeed distinct (correlation coefficient between pairs of populations ρij = 0.8, Fig. S1). We coded populations collected in the Ottawa River water as LCGA (low calcium- water and round gobies absent) and populations from the St. Lawrence River water as HCGP (high calcium water and round gobies present). Two populations had inverted patterns: RAF is calcium-poor, but round gobies are present (LCGP), and PDC is calcium-rich, but round gobies are absent (HCGA). It is noteworthy that PDC is located in a refuge habitat (wetland; Astorg et al., 2021) but is close to invaded sites (~8 km from the nearest invaded site). Field-collected A. limosus gastropods were obtained near the shore via hand picking and brought back to the lab for further processing (DNA extractions and the common garden experiment) in June-October 2017. Round goby abundance was measured in the field between July and September 2017 on a single occasion at each site. For this, each site was sampled using three seine net passes. The seine net used for sampling nearshore habitats was 9.14 m long by 1.8 m deep and 1/8 mesh on a 10 m distance. Round gobies were placed into bins and released after the three hauls. The geographic location and environmental characteristics of our sampling sites are detailed in Table S1. We measured dissolved oxygen (DO; mgL-1), pH, water temperature (oC), and conductivity (µS.cm-2) using a Professional Plus Model YSI multi-parameter probe (model 10102030; Yellow Springs Inc.) at each study site in 2017 at the time of gastropod collection. On the same occasions, we collected water samples and analyzed them for calcium (Ca), total phosphorus (TP), total nitrogen (TN), as well as dissolved organic carbon (DOC) at the GRIL-UQAM analytical lab (Supplementary Methods). Site-specific invasion status by round goby (invaded /refuge) is defined by presence/absence (Table S1). 3.     Reciprocal transplant experiment We conducted a laboratory reciprocal transplant experiment at the Université du Québec à Montréal (UQAM) with field-collected F0-generation A. limosus to investigate the response of gastropods from populations experiencing different source habitat types (low calcium/refuge Ottawa River LCGA or high calcium/invaded St. Lawrence River HCGP) to home and transplant water (calcium-poor water from the Ottawa River LCGA or calcium-rich water from the St. Lawrence River HCGP), in the presence or absence of round goby cues (Fig. 2A). The round goby cue treatment was used to test for predator effects on life history fitness components (adult survival and fecundity). A. limosus snails that were involved in the experiment were mostly at adult or sub-adult stages as we selected the largest individuals collected in the field and the dates of collection correspond to the presence of adult cohorts in the field (Pinel-Alloul & Magnin, 1973). Two additional water treatments were also tested: the artificial freshwater medium COMBO (Kilham et al., 1998), with and without the addition of calcium, to test for the specific effect of calcium (Ca) concentration on fitness components. The overall design was therefore a two (origin water: St. Lawrence River LCGA versus Ottawa River HCGP)  four (treatment water from St. Lawrence HCGP versus Ottawa River LCGA, COMBO growth media with/without Ca)  two (presence versus absence of round goby cue) factorial experiment, with 12 replicates (corresponding to our sampling populations) per treatment combination. As the experiment was conducted within a single generation, we acknowledge that our reciprocal transplant experiment did not allow us to differentiate plastic vs genetic vs maternal effects on the measured traits. We raised wild F0 individuals from the 12 populations in the laboratory for up to 73 days. Between 15 and 22 individuals (average: 19.61.3) were initially placed in 250 ml plastic cups with river water and reared in growth chambers (Thermo Scientific Precision Model 818) at 18°C with a light:dark photoperiod cycle of 12:12 hours. We fed A. limosus snails ad libitum with defrosted spinach every 2-3 days if needed or at each water change. Water in the water treatments was changed, and old spinach was removed every 3-4 days. For the round goby cue treatment, round gobies were kept in a 50-liter aquarium for two weeks before the experiment, set in a growth chamber at 18°C with a 12:12h light. Round gobies were fed 3-4 times a week with flake fish food (TetraFin). The round goby cue treatment was added as 5 mL of water from the round goby aquarium per A. limosus culture at each water change (every 2-3 days), which represents 2% of the volume of the culture. The addition of water was done manually with a 30 mL syringe. We recorded the survival of the adults and their fecundity (total number of eggs produced per individual) as response variables every 19  13 days throughout the experiment, using high-resolution stereomicroscopes (Olympus). However, due to the very low adult survival for all populations for the treatment testing the effects of calcium in growth media, we removed this comparison from further analyses (see Fig. S2). We analyzed fecundity (total number of eggs produced) and adult survival rates with a generalized linear model (GLM) and a generalized linear mixed effect model (GLMM) using the lme4 package in R respectively. We modeled fecundity with a negative binomial distribution and adult survival with a binomial distribution and a logit link function. We checked the models for overdispersion using the overdisp_fun function from https://bbolker.github.io. We tested both models with and without the random effect of populations, using an AIC approach corrected for small sample size (AICc) and the ΔAIC criterion to evaluate the random effects (kept when ΔAIC > 2) with the R package bbmle. Likelihood ratio tests were used to evaluate the fixed effects for both the GLM and GLMM models. For the GLM model of fecundity, we checked for the influence of outliers on the model, by using both visual and quantitative diagnostics of the leverage and Cook's distance. We did not find a consistent effect of outliers on this model and thus did not remove outliers. Fixed effect coefficients and their confidence intervals were converted to incident rate ratios (fecundity) and odd ratios (adult survival) using an exponential function. 4.     De novo genome assembly and pool-sequencing As pool-sequencing approaches require a reference genome (Schlötterer et al., 2014) to align the raw sequence reads and to conduct analyses that require information about the position of SNPs on the genome (e.g., to calculate diversity indices along the genome; Kofler, Orozco-terWengel, et al., 2011), we conducted a de novo genome assembly for Amnicola limosus. We extracted DNA from the tissue of one individual snail collected in 2017 using a standard Phenol Chloroform extraction method, after removing the shell and excising the mollusk guts to avoid contaminants. Briefly, tissue samples were placed in a digestion buffer containing proteinase K and digested at 55°C. DNA was then isolated using an isoamyl-phenol-chloroform solution, followed by ethanol precipitation. DNA quantity and quality were verified using a combination of different quality control methods: Qubit assay (Thermo Fisher Scientific Inc.), Tapestation (Agilent Inc.), and Femto Pulse (Agilent Inc,). Fragments longer than 1 kb were selected for further processing. Library preparation was performed using 10X Chromium Linked-Read library kit (10X Genomics Inc.) and sequenced on 3 lanes of Illumina HiSeqX PE150 at Genome Quebec. A long-read approach (10X) was selected for the sequencing as long-read-based assembly methods are more powerful at dealing with some issues (e.g. repeats, high GC content) arising in short-reads-based assemblies (Jung et al., 2020). We ran fastp v.0.23.4 on the three 10X paired-end reads to obtain the insert size, using the -Q option to disable quality filtering. Fastp results showed two estimated insert size peaks at 175 and 270 bp. Reads were assembled with Supernova v.2.1.1. The assembled genome is 1,899,346,312 bp in length, with 815,134 scaffolds and a N50 of approximately 5 kb. We estimated the genome size with Jellyfish 2.3.0 by reading simultaneously the R1 reads of the three runs of 10X sequencing using the options -F 3, -m 21, and -s 2G. The resulting histogram was then processed with GenomeScope http://qb.cshl.edu/genomescope/ (Vurture et al., 2017), which yielded an estimated haploid genome length of 382,882,063 bp, with 2.25% of repeats and 4.72% of heterozygosity. This is much smaller than the assembled genome size (1,899,346,312 bp) due to fragmentation. We also used Benchmarking Universal Single-Copy Orthologs BUSCO v5.2.2 (Manni et al., 2021) to assess gene completeness by searching for core mollusk orthologous genes, using the option -genome and the BUSCO.v4 lineage mollusca_odb10.2019-11-20. Most core genes were missing from our draft genome (74.7%), with only 19% complete core genes recovered (C:19%[S:18.1%, D:0.9%], F: 6.3%, M:74.7%, n=5295, with C: complete single copy BUSCO genes, S: complete and single-copy BUSCOs, D: complete and duplicated BUSCOs, F: fragmented BUSCOs, M: missing BUSCOs, n: total BUSCOs searched). We acknowledge that the draft assembly presented here is of low-quality and completeness. However, the draft genome was sufficient for our population genomics goals, as we were not attempting to resolve the precise genetic architecture of specific traits (Savolainen et al., 2013). For a similar application, see Brennan et al. (2022). For the pooled sequencing, we extracted DNA from the tissues of 40 individuals per pool/population using the same standard Phenol Chloroform extraction method mentioned above. We quantified all samples using a Picogreen ds DNA assay (Thermo Fisher Scientific Inc.) on an Infinite 200 Nanoquant (Tecan Group Ltd). Samples were normalized to a dsDNA concentration of 15 ng/µL, re-quantified, and pooled according to the sampling population. Thus, we created 12 pools of 40 individuals each at 15 ng/µL. Libraries were prepared with the NEB Ultra II kit for shotgun sequencing and sequenced on 5 lanes of HiSeq2500 125 bp pair-ended at Genome Quebec. The number of reads sequenced per population was between 187-248 million paired-end reads. We used the following formula to calculate the expected mean coverage: read length  number of reads / estimated haploid genome length. Given an estimated genome size of 382,882,063 bp, a read length of 125bp, and 93.7-124.2 million single-end reads sequenced, we calculated that our expected mean coverage was between 30-40X. We assessed the quality of our pool-seq Illumina libraries with fastqc 0.11.5, from which we obtained a percentage of repeats between 18.4 and 39.7%. Pool-seq presents some caveats as it offers limited information about linkage disequilibrium (Feder et al., 2012) and can lead to bias during the estimation of allele frequencies (Gautier et al., 2013). However a large array of methods have been developed to account for such biases and accordingly, we have used Baypass and poolFreqDiff to detect outlier loci as well as poolfstat and popoolation to estimate genetic diversity indices (Kofler, Orozco-terWengel, et al., 2011; Kofler, Pandey, et al., 2011; Schlötterer et al., 2014; Gautier, 2015; Wiberg et al., 2017; Gautier et al., 2022). 5.     Read processing and SNPs calling We prepared the assembled reference genome of Amnicola limosus by first indexing it with the Burrows-Wheeler Aligner ( BWA; Li & Durbin, 2009) v0.7.17 and with Samtools faidx v1.12, and by creating a dictionary with Picard Tools v2.23.3. We then used a custom pipeline for pool-seq quality processing, read alignment, and SNP discovery. We first trimmed reads with the function trim-fastq.pl from popoolation v1.2.2 (Kofler, Orozco-terWengel, et al., 2011) for a base quality of 20 and a minimum length of 50 bp, and assessed the quality of the trimmed reads with fastqc. We aligned trimmed reads to the reference genome with bwa-mem v0.7.17. We filtered out ambiguously aligned reads with samtools v1.13 using a score of 20 and sorted bam files with samtools. We used samtools flagstat to find the percentage of Illumina reads aligned to the reference genome, which was on average 53.5% SD 4.0%. We obtained an mpileup file with samtools mpileup, then filtered SNPs with a minimum coverage of five across all populations. We converted the mpileup file to a sync file with Popoolation2 v1.10.03 (Kofler, Pandey, et al., 2011), with a quality score of 20. The sync file was then converted to a "pooldata" object with the poolfstat package in R (Hivert et al., 2018), using a haploid pool size of 80 for all populations, a minimum read count per base of two, a minimum coverage of five and a maximum of 300, a minimal minor allele frequency of 0.0125 (to remove singletons) and discarding indels. This pipeline retained 21,312,700 biallelic SNPs. 6.     Detecting genomic signatures of selection To detect putative loci under selection, we used both outlier and environmental association analysis approaches with the hierarchical Bayesian models implemented in Baypass (Gautier, 2015). Baypass is advantageous in the context of our study as it enables the detection of outlier SNPs after taking demographic history into account, thus avoiding the confounding effects of population structure. The core model estimates the scaled covariance matrix Ω of population allele frequencies, which summarizes population history. The scaled covariance matrix Ω is based on the deviation of population allele frequency from an average allele frequency calculated across populations, where populations that are more closely related tend to have more similar deviations (i.e. covariance) from the average (Coop et al., 2010). The scaled covariance matrix Ω is calculated based on the whole SNP dataset (non-outliers and outliers) and is not related to pairwise FST except in some conditions (Coop et al., 2010). During the calculation of SNP-specific statistics of differentiation (XtX and BFis see below), this covariance is then explicitly accounted for through the scaled covariance matrix Ω (Gautier, 2015). The full dataset was divided into 27 pseudo-independent datasets to overcome computing limitations. The "pooldata" object from poolfstat was converted to the 27 sub-dataset Baypass input files with the "thinning" subsampling method and sub-sample size of 750,000 SNPs. We conducted the outlier analysis using the core model, to estimate the XtX statistic and associated p-value under a χ2 distribution with 12 degrees of freedom (bilateral test, Baypass manual). The XtX statistic is similar to a SNP-specific FST but corrects for covariance using the scaled covariance matrix Ω (Gautier, 2015). We considered SNPs as outliers when their p-value derived from the XtX estimator was < 0.001. The shape of the histogram p-values derived from the XtX statistics confirmed that they were well-behaved (A peak close to 0 for loci putatively under selection and a uniform distribution between [0,1] for neutral loci; Fig. S3B; François et al., 2016). The scaled covariance matrices Ω from the 27 sub-datasets were compared visually to assess the concordance of the results, and then the statistics obtained for each sub-dataset were combined (Baypass manual). For the environmental association analysis, we opted for the standard model STD under the Importance Sampling approach in Baypass, in which the association between covariables and SNP allele frequencies is assessed independently for each covariable. If this model is not able to distinguish the effect of the calcium concentration from the effect of round goby presence/absence, we should observe a large overlap between outlier SNPs associated with each covariable (see the supplementary material for a more in-depth explanation of the selection of the STD model). This model computes for each SNP its regression coefficient βik of the association between the SNP allele frequencies and a covariable to compare the model with association (βik ≠ 0) against the null model (βik = 0), from which a Bayes factor BFis is derived. We selected two environmental covariables: invasion status (presence/absence of the round gobies) and calcium concentration. We also estimated the C2-statistic with the STD model (Olazcuaga et al., 2021), which is more appropriate for binary variables and was used for the association with round goby presence/absence. We checked the Pearson correlation coefficient between covariables with the function pairs.panel() in the package psych in R, which was r = 0.71 (slightly above the recommended threshold for the regression method of |r| < 0.7, Fig. S4). For the calcium covariable, we ran three independent runs of the STD model with the random number -seed option, then computed the median BFIS across runs to ensure the convergence of the MCMC results. To check for convergence of the independent runs, we calculated the Förstner and Moonen distance (FMD; Förstner & Moonen, 2003) between the scaled covariance matrix Ω matrices from each sub-dataset with the fmd.dist function in R (included in Baypass). Results were found to be convergent, with all FMD values < 0.12. Covariables were all standardized to  = 0 and  = 1, and we used default parameters for the MCMC analyses. For the calcium association, SNPs were considered significantly associated with a covariable when BFis > 20 dB (Jeffrey’s rule for “decisive evidence”; Gautier, 2015). For the association with round goby presence/absence, we used the R package qvalue to correct for multiple hypothesis testing on the p-values derived from the C2-statistic and applied a False Discovery Rate of α = 0.01 as a q-values cut-off for outlier detection (Olazcuaga et al., 2021). As a complementary analysis to investigate the potential for adaptation to the invasive predator and low calcium concentrations, we used poolFreqDiff (Wiberg et al., 2017) to identify outlier SNPs showing consistent allele frequency differences in the same direction between replicate populations contrasting the two calcium concentration (low vs high) and round goby predation (absent vs present) environment types. Note that with this approach, our aim was not to identify independent instances of parallel adaptation but rather detect genotype-environment associations; consistent allele frequency differences could arise due to adaptation occurring in a shared recent ancestor. This method relies on modeling allele frequencies with a generalized linear model (GLM) and a quasibinomial error distribution, which should result in a uniform distribution of p-values between [0,1] under the neutral (null) scenario (Wiberg et al., 2017). It also accounts for bias in allele frequency estimation (e.g., Gautier et al., 2013) by rescaling allele counts to an effective sample size neff (Feder et al., 2012). We ran the analysis separately for the two covariables as binary comparisons: invasion status (presence/absence) and calcium concentration (low < 24.3 mg/L, high > 34.3 mg/L). For the minimum read count per base, and the minimum and maximum coverage, we used the same values as for the poolfstat filtering, and we also rescaled the allele counts with neff and added one to zero count cells. To account for demography and genetic structure, we applied the empirical-null hypothesis approach (François et al., 2016) to recalibrate p-values based on a genomic inflation factor of λ = 0.85. We confirmed that recalibrated p-values were well-behaved based on the observed peak close to 0 and the uniform distribution between [0,1] (Fig. S5; François et al., 2016). We then transformed the recalibrated p-values into q-values with the R package qvalue, and defined outliers if their q-value was below the FDR α = 0.01. We considered candidate SNPs for adaptation to low-calcium concentration or to the round goby predation as SNPs that were in common between the environmental association analyses with Baypass and poolFredDiff and were uniquely associated with each covariable (Fig. S6). We also excluded SNPs showing inconsistent allele frequency associations with environmental variables, which were considered false positives (N = 2 for the association with the calcium gradient and N = 1118 for the association with round goby predation). Note that the positive or negative associations with the co-variables do not correspond to positive or negative selection, as the reference allele is chosen arbitrarily in poolfstat (and thus in Baypass). 7.     Population structure, genetic diversity, and demography We first estimated population structure with a genome-wide pairwise FST matrix from the poolfstat package, using the same parameters as described above and removing the outlier SNPs detected with the Baypass and poolFredDiff analyses. We used the pairwise FST matrix to visualize phylogenetic relationships between populations with an UPGMA tree implemented in the R package ape and evaluated the support of nodes with bootstrapping (N = 1000). The pairwise FST matrix was also used to assess the potential for isolation by distance, using the relationship between the genetic distance (FST/(1- FST); Rousset, 1997) and the log of the geographical distance (2D distribution of populations) with a Mantel test (9999 permutations) using the vegan package in R. Distances between sites were obtained by measuring paths between populations along the rivers (in m) with Google Earth v.10.38.0.0. Two outliers were present In the IBD analysis, but the relationship was still significant when these two data points were removed. We tested for isolation by environment, by calculating the environmental distance between population pairs using the squared Mahalanobis distance, determined from the calcium concentration and round goby presence/absence with the R package ecodist. We verified that there was no correlation between the environmental distance and geographic distance (non-significant Mantel test with 9999 permutations: r2 = 0.03, p-value = 0.20). Then we tested for a relationship between environmental distance and genetic distance as FST/(1-FST) with a Mantel test (9999 permutations). Finally, we compared the results of the pairwise FST matrix with the scaled covariance matrix Ω of population allele frequencies from the core model in Baypass, which summarizes some aspects of population history, though the scaled covariance matrix Ω is based on the complete dataset and thus includes outliers and putatively neutral SNPs. To test if the round gobies impacted levels of genetic diversity in invaded A. limosus populations, we compared diversity indices between invaded and refuge populations. However, it should be noted that moderate bottlenecks do not necessarily trigger decreases in nucleotide diversity and heterozygosity (Adams & Edmands, 2023), particularly in the initial generations after a bottleneck or in the presence of high levels of standing genetic variation or gene flow (Gompert et al., 2021). We obtained the observed heterozygosity from the poolfstat package and compared heterozygosity levels between round goby-impacted and refuge populations with a t-test after checking for the assumptions of normality (qqplot) and homoscedasticity (Bartlett test). We calculated genome-wide diversity indices (Tajima's pi, Watterson theta, and Tajima's D) using popoolation (Kofler, Orozco-terWengel, et al., 2011). First, we generated mpileup files for each population separately with samtools (Li et al., 2009) from the sorted.bam files output by the custom pipeline. Then we computed the genome-wide diversity indices using non-overlapping windows of 100 kb, a minimum coverage of 20 (as recommended in Kofler, Orozco-terWengel, et al., 2011 except for Tajima’s D with minimum coverage = 13, as the corrected estimator requires the pool size < 3 minimum coverage), a minimum quality of 20, a minimum fraction covered of 0.05 and a pool size of 40. It should be noted that popoolation calculates the diversity indices along chromosomes; thus, due to the fragmentation of our draft genome, the diversity indices were calculated mostly among separate contigs and in windows < 100 kb. The minimum number of SNPs per window across populations using this filter was 25. We used Hedge's G to detect a potential difference in the three diversity indices between the invaded and refuge populations.  We also investigated the demographic history of three population pairs using the diffusion approximation method implemented in δaδi (Gutenkunst et al., 2009). We aimed to detect a potential bottleneck in the invaded populations and to quantify the magnitude and direction of gene flow between the two habitat types (invaded and refuge). Due to the significant computational time required for these analyses, we selected a subset of populations from each habitat type for our models; we selected the populations PB-LCGA, IPE-LCGA, and PDC-HCGA as refuges, paired with PG-HCGP, BEA-HCGP, and GOY-HCGP as invaded populations respectively. PDC-HCGA was of particular interest as a high calcium population located in an uninvaded wetland (refuge). Our most complex model (Fig. S7) has defined effective population sizes after the split (nu1 and nu2), followed by a bottleneck in both populations (modeling a scenario in which the round goby invasion impacted population abundance at the whole ecosystem scale) followed by exponential recovery in both populations. TS is the scaled time between the split and the bottleneck and TB is the scaled time between the bottleneck and present. Migration rates are asymmetric but constant through time after the split, with mIR the migration from refuge to invaded populations and mRI in the opposite direction. As we knew the time of the potential bottleneck (12 years before sampling with one generation per year), we set TB as a fixed parameter. Because TB is set as a fixed parameter, the parameter  was an explicit parameter in the models that included a bottleneck. We defined θ with μ the mutation rate of  substitutions per site per year from the Caenograstropoda species Nucella lamellose (McGovern et al., 2010), and L the effective sequenced length, calculated as . We investigated six additional non-nested models: a) bottleneck and growth only in the invaded population (constant Ne for the refuge population) with uneven migration, b) only bottlenecks in both populations without recovery and uneven migration, c) only bottleneck in the invaded population with uneven migration (constant Ne for the refuge population), d) a simple population split at TS with uneven migration, e) a population split with symmetric migration and f) a population split without migration (Fig. S7). The default local optimizer was used on the log of parameters with random perturbation of the parameters to obtain a set of parameter values resulting in the highest composite likelihood. Optimization was conducted repeatedly until convergence was reached (i.e., three optimization runs with log-likelihood within 1% of the best likelihood). One model in one population pair did not reach convergence after 30 optimization runs (bottleneck without recovery for the PDC-HCGA and GOY-HCGP population pair). Finally, we compared our seven models based on the differences in the likelihoods and plots of residuals of the models. We did not use AIC for model selection as it is not appropriate to compare composite likelihoods generated by dadi when SNPs are linked (Noskova et al., 2020) As we obtained unlikely results during the conversion of parameters in our best models, possibly due to imprecise mutation rates, we did not conduct parameter conversion. To obtain uncertainties on our parameters while accounting for the effect of linkage, we used bootstrapping and the Godambe Information Matrix approach (Coffman et al., 2016). For this, we generated 100 bootstrapped datasets with a chunk size of  bp. Note that due to the draft genome fragmentation, the bootstrapping was executed between scaffolds of size usually smaller than 100 kb, which may have led to underestimated uncertainties. To address the potential effect of using a pool-seq approach on the variance in allele frequency estimates stemming from differences in coverage between pools (Gautier et al., 2013), we used a filter to obtain relatively homogenous coverage between our two selected populations/pools. From the initial SNPs dataset output by poolfstat (21,312,700 SNPs), we retained SNPs that fell within the 1st and 3rd quartiles of coverage in both populations (11-19X for PB-LCGA and 10-18X for PG-HCGP; 11-18X for IPE-LCGA and 9-15X for BEA-HCGP; 10-16X for PDC-HCGA and 10-17X for GOY-HCGP). We also filtered out SNPs that were detected as outliers (putatively under selection) in the Baypass (core and aux or STD models) and poolFreqDiff analyses and removed uninformative SNPs (fixed or lost in both populations). To accommodate for large computation time during the optimization, the datasets were thinned at random to retain a final dataset of ≈ 1 million SNPs per population pair. We used a custom script and the dadi_input_pools function from the genomalicious R package (Thia & Riginos, 2019) with the “probs” parameter in the methodSFS option to transform allele frequency data into the SNP data format from δaδi. We used δaδi to infer the folded SFS as we did not have information on the ancestral allele state. Due to low confidence in the low-frequency estimates, we also masked entries from 0 to 5 reads.
创建时间:
2024-08-30
二维码
社区交流群
二维码
科研交流群
商业服务