Genomics-informed conservation units reveal spatial variation in climate vulnerability in a migratory bird
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.wh70rxwt9
下载链接
链接失效反馈官方服务:
资源简介:
Identifying genetic conservation units (CUs) in threatened species is critical for the preservation of adaptive capacity and evolutionary potential in the face of climate change. However, delineating CUs in highly mobile species remains a challenge due to high rates of gene flow and genetic signatures of isolation by distance. Even when CUs are delineated in highly mobile species, the CUs often lack key biological information about what populations have the most conservation need to guide management decisions. Here we implement a framework for rigorous CU identification in the Canada Warbler (Cardellina canadensis), a highly mobile migratory bird species of conservation concern, and then integrate demographic modeling and genomic offset within a CU framework to guide conservation decisions. We find that whole-genome structure in this highly mobile species is primarily driven by putative adaptive variation. Identification of CUs across the breeding range revealed that Canada Warblers fall into two Evolutionarily Significant Units (ESU), and three putative Adaptive Units (AUs) in the South, East, and Northwest. Quantification of genomic offset within each AU reveals significant spatial variation in climate vulnerability, with the Northwestern AU being identified as the most vulnerable to future climate change based on genomic offset predictions. Alternatively, quantification of past population trends within each AU revealed the steepest population declines have occurred within the Eastern AU. Overall, we illustrate that genomics-informed CUs provide a strong foundation for identifying current and potential future regional threats that can be used to manage highly mobile species in a rapidly changing world.
Methods
Resequencing sample collection and DNA extraction
We collected samples from an additional 181 breeding adult Canada Warblers from across the breeding range in North America in collaboration with multiple university researchers, private environmental companies, and state and federal agencies (Supplemental Figure 1). For DNA extraction, we collected blood from 134 individuals (~80 µl), via brachial venipuncture, preserved it in Queen’s lysis buffer, and stored it at room temperature. Blood (50-80 µl) was extracted using Qiagen DNeasy Blood and Tissue Kits (QIAGEN) and eluted into 100 µl of provided AE buffer. For the remaining 47 individuals, we collected tail feathers by pulling 2 tail feathers from each bird and storing feathers at -20C. We cut the calamus of one feather from the shaft and extracted the calamus using the modified Qiagen DNeasy Blood and Tissue protocol (Schweizer & DeSaix, 2023). After DNA extraction, we quantified samples using Qubit dsDNA assay.
DNA resequencing
We prepared the breeding samples for low coverage whole genome sequencing using a modified Nextera prep (Schweizer & DeSaix, 2023) with normalized DNA input. We sequenced samples in two libraries, 110 individuals on an Illumina HiSeq 4000 using paired end 150bp reads and 71 individuals on an Illumina NovaSeq 6000 using paired end 150bp reads. The 71 individuals on the NovaSeq were sequenced across multiple lanes to get to the targeted sequencing depth of 2-3X coverage per sample and included replicates of 32 samples with lower than 1.5X coverage from the HiSeq 4000 run.
Bioinformatic processing
We used Conda v4.13.0 (Anaconda Documentation, 2020) environments to manage bioinformatic packages on the RMACC Summit supercomputer managed jointly by Colorado State University and University of Colorado, Boulder. To process raw fastqs from the 181 individuals that underwent low coverage whole genome sequencing, we used Trim Galore v0.6.7 (Krueger, 2012), a wrapper for cutadapt v1.18 (Martin, 2011), and FastQC v0.11.9 (Andrews, 2010) to trim any remaining Illumina adaptors in the fastqs. Next, based on recommendations for low coverage data generated with NovaSeq platforms (Lou & Therkildsen, 2022) we performed a sliding window cut of the 3-prime end of the reads to remove low quality tails, defined as 4 bases in a row with mean QUAL scores less than 20, using fastp v0.22.0 (Chen et al., 2018). We checked fastqs for quality using FastQC and MultiQC v1.0.dev0 (Ewels et al., 2016) before and after trimming reads.
After processing raw fastqs, we aligned samples to the Canada Warbler reference genome using Burrows-Wheeler Alignment software (bwa mem, bwa v0.7.17) (Li & Durbin, 2009). Then we added read group information using Picard v2.26.11 AddorReplaceReadGroups (Picard Toolkit, 2014/2019)and marked duplicate reads using samtools v1.11 markdup (Danecek et al., 2011) before merging individuals with multiple bams. After merging bams, we checked sample coverage using bedtools v2.30.0 genomecov (Quinlan & Hall, 2010), and samples with less than 1X coverage were removed, leaving 169 individuals.
We used the processed bams to call variants using GATK v4.2.5.0 HaplotypeCaller (McKenna et al., 2010) and BCFtools v1.15.1 mpileup (Danecek et al., 2021). Then, we stringently filtered the variant sets using BCFtools, allowing only biallelic sites, a minor allele frequency of greater than 5%, QUAL score of greater than 30 and less than 10% missing across the 169 individuals. We intersected the filtered variant sets from bedtools and GATK to create a high-quality variant set to use for base quality score recalibration. Using the intersected variants, we recalibrated the sample bams using GATK BaseRecalibrator and ApplyBQSR. With the recalibrated bams, we used HaplotypeCaller to call a recalibrated set of variants. Then we filtered the recalibrated variant set allowing only biallelic sites, a minor allele frequency of less than 5%, QUAL score of greater than 30 and less than 20% missing data across the 169 individuals.
Using the recalibrated, filtered variant set we performed an exploratory analysis using R (R Core Team, 2022) and the package srsStuff (Anderson, 2020) to produce single-read sampling principal components analysis (PCA) of whole genome structure. We used single-read sampling because differences in the average coverage across samples can be mistaken for population structure on PCA in low coverage data (Lou & Therkildsen, 2022). Single read sampling equalizes coverage for all samples. Despite equalizing coverage, we found significant platform effects, where samples sequenced on different platforms have inherent bias that can be mistaken for population structure (for example of platform effects on low coverage data, see Lou & Therkildsen, 2022). We removed platform-associated variants from the dataset and proceeded with the analysis once samples no longer clustered in platform groups by PCA (for full methods to remove platform effects, see Supplemental Methods).
Adaptive loci identification
To select environmental variables, we used gradient forest (Ellis et al., 2012), an extension of random forest (Liaw et al., 2002), and 23 environmental variables potentially important to Canada Warbler breeding ecology based on previous research (Supplemental Table 3, Ferrari et al., 2018; Reitsma et al., 2020). Environmental data were extracted from each of 16 sampling locations, excluding two sampling sites with fewer than 4 individuals. To identify putatively adaptive loci, we used two approaches, redundancy analysis (RDA) and Latent Factor Mixed Models (LFMM). Once loci were identified using both RDA and LFMM, the union of loci discovered by both methods was used as our set of candidate adaptive loci which was filtered from the original vcf.
创建时间:
2024-10-31



