Population genetics and invasion history of the European Starling across Aotearoa New Zealand
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.6djh9w1bd
下载链接
链接失效反馈官方服务:
资源简介:
The expansion of human settlements over the past few centuries is responsible for an unprecedented number of invasive species introductions globally. An important component of biological invasion management is understanding how introduction history and post-introduction processes have jointly shaped present-day distributions and patterns of population structure, diversity, and adaptation. One example of a successful invader is the European starling (Sturnus vulgaris), which was intentionally introduced to numerous countries in the 19th century, including Aotearoa New Zealand, where it has become firmly established. We used reduced-representation sequencing to characterise the genetic population structure of the European starling in New Zealand, and compared the population structure to that present in sampling locations in the native range and invasive Australian range. We found that population structure and genetic diversity patterns suggested restricted gene flow from the majority of New Zealand to the northmost sampling location (Auckland). We also profiled genetic bottlenecks and shared outlier genomic regions, which supported historical accounts of translocations between both Australian subpopulations and New Zealand, and provided evidence of which documented translocation events were more likely to have been successful. Using these results as well as historic demographic patterns, we demonstrate how genomic analysis complements even well-documented invasion histories to better understand invasion processes, with direct implication for understanding contemporary gene flow and informing invasion management.
Methods
Sample Collection
A total of 106 starling specimen samples were obtained from various contributors within New Zealand from five geographically distinct locations between May 2022 and October 2023. Sampling covered three locations in the North Island, specifically in the Auckland region (AUK: n=18), the Manawatū-Whanganui region (WHA: n=12), the Wellington region (WEL: n=40) and two in the South Island in the Marlborough region (MRL: n=15) and Canterbury region (CAN: n=21). In addition to the newly obtained samples, we also incorporated sequence data from the native European range (Antwerp, Belgium; ANT: n=15, Newcastle, United Kingdom; NWC: n=15, Monks Wood, United Kingdom; MKW: n=15), as well as two locations from within the invasive Australian range (Orange; ORG: n=15, McLaren Vale; MLV: n=15) from a previously published Diversity Arrays Technology Pty Ltd sequencing (DArT-seq) dataset.
DNA Extraction and Sequencing
Extracted DNA from the newly collected New Zealand samples was sent to Diversity Arrays for sequencing. Sequencing was performed on an Illumina Hiseq2500/Novaseq6000.
Raw Sequence Processing
The previously published raw DArT-seq data, along with the MRL samples (January 2023 sequencing batch) were demultiplexed using stacks v2.2 process_radtags, while also discarding low quality reads (-q), reads with uncalled bases (-c), and rescuing barcodes and RAD-Tag cut sites (-r). It was not necessary to perform this step on the remainder of the new raw sequence data because DArT performed in-house demultiplexing using a proprietary bioinformatic pipeline.
For all the data, we used fastp v0.23.2 to remove adapter sequences and in the same step filtered reads for a minimum Phred quality score of 22 (-q 22) and a minimum length of 40 (-l 40). Both batches of sequence data produced as part of this study were additionally length trimmed to reduce the read length of the newer sequence data to match the base length of the older sequence data (-b 69).
Mapping, Variant Calling, and Filtering
We used the program bwa v0.7.17 to index the reference genome S. vulgaris vAU1.0 and align the trimmed DArT reads using the bwa aln function (-B 5 to trim the first 5 base pairs of each read), which is optimised for single-end short reads. This was then followed by the bwa samse function for producing the SAM formatted output files containing the alignments and their respective base qualities. Alignments were then sorted and indexed using samtools v1.16.1, and single nucleotide polymorphisms (SNPs) were subsequently called and annotated using bcftools v1.16 with the mpileup (-a "DP,AD,SP", --ignore-RG) and call (-mv, -f GQ) functions.
We removed known technical replicates and identified relatives from the data. vcftools v0.1.15 was used to remove indels (--remove-indels), and quality filter for a minimum site quality score of 30 (--minQ30), minimum genotype quality score of 20 (--minGQ 20), and minimum and maximum depth of coverage of 5 (--minDP 5) and 100 (--maxDP 100). Then, to account for batch effects that may impact the sequenced loci, we kept only SNPs present in at least 50% of the individuals in each sampling location. We ran one final filtering step to ensure appropriate levels of missingness and rare alleles using the following parameters: maximum missingness per site of 30% (--max-missing 0.7), minor allele count of 5 (--mac 5), and a minimum and maximum allele per locus of 2 (--min-alleles 2 --max-alleles 2), resulting in a dataset containing 19,174 SNPs and 141 individuals.
创建时间:
2024-10-22



