Environment but not geography explains genetic variation in the invasive and largely panmictic European starling in North America
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.j424f07
下载链接
链接失效反馈官方服务:
资源简介:
Populations of invasive species that colonize and spread in novel environments may differentiate both through demographic processes and local selection throughout the genome. European starlings (Sturnus vulgaris) were introduced to New York in 1890 and subsequently spread throughout North America, becoming one of the most widespread and numerous bird species on the continent. Genome-wide comparisons across starling individuals and populations can identify demographic and/or selective factors that facilitated this rapid and successful expansion. We investigated patterns of genomic diversity and differentiation using reduced-representation genome sequencing (ddRADseq) of 17 starling populations. Consistent with this species’ high dispersal rates and rapid expansion history, we found low genome-wide differentiation and few FST outliers even at a continental scale. Despite starting from a founding population of approximately 180 individuals, North American starlings do not seem to have undergone a detectable genetic bottleneck: they have maintained an extremely large effective population size since introduction. We find more than 200 variants that correlate with temperature and/or precipitation. Genotype-environment associations (but not outlier scans) identify these SNPs against a background of negligible genome- and range-wide divergence. Such variants fall in the coding regions of genes associated with metabolism, stress, and neurological function. This evidence for incipient local adaptation in North American starlings suggests that it can evolve rapidly even in wide-ranging and evolutionarily young populations. This survey of genomic signatures of expansion in North American starlings is the most comprehensive to date and complements ongoing studies of world-wide local adaptation in these highly dispersive and invasive birds.
Methods
Breast muscle tissue was sampled using biopsy punches (Integra Miltex) and frozen in 95% ethanol. Samples were shipped on dry ice, and DNA was extracted using a Qiagen DNeasy kit following the manufacturer's protocol (Qiagen). DNA concentration of each sample was quantified using a Qubit 2.0 fluorometer (Thermo Fisher Scientific). Following the protocol of Peterson et al. (2012), we generated a reduced-representation genomic data set of doubledigested, restriction-site associated DNA (RAD) markers as described in Thrasher et al. (2017) using the restriction enzymes SbfI and MspI and adaptors P1 and P2. We sequenced 100-bp, single-end reads of the 160 best-quality libraries on an Illumina HiSeq 2500. We trimmed and filtered for quality using the fastx-toolkit (http:// hannonlab.cshl.edu/fastx_toolkit). We then used the process_radtags commands in stacks version 1.19 (Catchen et al., 2013) to demultiplex the remaining sequences. In subsequent filtering steps, we retained reads only if the following conditions were met: reads passed the Illumina chastity filter, contained an intact SbfI RAD site, contained one of the unique barcodes, and did not contain Illumina indexing adaptors.
Individual reads were mapped to a Sturnis vulgaris reference genome (Hofmeister, Rollins et al., in prep) using bowtie2 version 2.2.8 (Langmead & Salzberg, 2012) using the “very sensitive local” set of alignment presets, and then assembled sequences into “stacks” using the ref-map option in stacks. Compared to a reference-free approach, the bioinformatics pipeline used for the reference-based assembly has the advantage of using fewer similarity thresholds to build loci. We required that a single-nucleotide polymorphism (SNP) be present in a minimum of 80% of the individuals (-r 0.8) with a minimum stack depth of 10 reads at a locus within an individual (-m 10) for it to be called. We removed two individuals, one with >50% missing data and one with >50% relatedness (measured using the unadjusted AJK statistic and calculated within vcftools), leaving 158 individuals remaining in the study. A total of 15,038 SNPs were identified. We used the VCFTOOLS –hwe option to remove any SNPs out of Hardy–Weinberg equilibrium (HWE) (e.g., an exact test that compared expected and observed heterozygosity in polymorphic sites only gave a p-value less than .001). About 6% of sequenced variants (904 variants) were out of HWE across all sampling sites; given that (i) we are particularly interested in SNPs that may be specific to certain populations, and (ii) filtering for HWE did not change the results described in sections (1) and (2) below, we retain all 15,038 SNPs for the VCF file included in this upload.
在新环境中定殖并扩散的入侵物种种群,可通过种群动态过程与全基因组范围的局部选择发生遗传分化。欧洲椋鸟(Sturnus vulgaris)于1890年被引入美国纽约州,随后扩散至整个北美大陆,成为该大陆分布最广、数量最多的鸟类物种之一。对椋鸟个体及种群开展全基因组比较分析,可识别推动这一快速成功扩张的种群动态和/或选择因素。我们对17个椋鸟种群采用简化基因组测序(reduced-representation genome sequencing,ddRADseq),探究其基因组多样性与分化模式。
与该物种高扩散速率和快速扩张的历史背景一致,我们发现即便在大陆尺度下,全基因组分化水平依然较低,且几乎不存在遗传固定指数(FST)异常位点。尽管北美椋鸟的奠基种群仅约180个个体,但并未检测到可观测的遗传瓶颈:自引入以来,其有效种群规模始终维持在极高水平。
我们鉴定出超过200个与温度和/或降水相关的遗传变异。基因型-环境关联分析(而非异常位点扫描)可在全基因组与分布范围均近乎无分化的背景下,识别出这些单核苷酸多态性(single nucleotide polymorphism, SNP)位点。此类变异位于与代谢、应激反应及神经功能相关的基因编码区。北美椋鸟中存在的早期局部适应证据表明,即便在分布广泛且演化历程较年轻的种群中,局部适应也可快速演化。本项针对北美椋鸟扩张过程中基因组特征的调查是目前最为全面的研究之一,可补充针对这些高扩散性入侵鸟类全球范围局部适应的现有研究。
方法
采用活检打孔器(Integra Miltex)采集胸肌组织样本,将样本置于95%乙醇中冷冻保存。样本通过干冰运输,使用Qiagen DNeasy试剂盒并遵照制造商说明书提取DNA(Qiagen)。采用Qubit 2.0荧光计(Thermo Fisher Scientific)对每个样本的DNA浓度进行定量。参照Peterson等人(2012)的实验方案,我们使用限制性内切酶SbfI和MspI以及接头P1、P2,按照Thrasher等人(2017)的描述方法,构建了双酶切限制性位点关联DNA(restriction-site associated DNA, RAD)标记的简化基因组数据集。我们对160个质量最优的文库进行100bp单端测序,测序平台为Illumina HiSeq 2500。使用fastx-toolkit(http://hannonlab.cshl.edu/fastx_toolkit)对测序数据进行质量修剪与过滤。随后使用Stacks版本1.19中的process_radtags命令(Catchen等人,2013)对剩余序列进行解复用。
在后续过滤步骤中,仅保留满足以下条件的测序reads:通过Illumina纯度过滤、包含完整的SbfI限制性位点、带有唯一条形码且不包含Illumina索引接头。
将个体reads比对至欧洲椋鸟参考基因组(Hofmeister、Rollins等,待刊),比对工具为bowtie2版本2.2.8(Langmead & Salzberg,2012),比对参数设置为“极敏感局部比对”预设模式。随后使用Stacks版本1.19的ref-map功能,将比对序列组装为“stacks”位点簇。相较于无参考基因组的分析策略,基于参考基因组的组装生物信息学流程在构建基因座时所需的相似性阈值更少。
我们要求单核苷酸多态性(SNP)位点需在至少80%的个体中被检测到(参数-r 0.8),且单个个体内某基因座的堆叠reads深度不低于10(参数-m 10),方可对该SNP进行分型。我们剔除了2个个体:1个存在超过50%的缺失数据,另1个的亲缘关系系数超过50%(亲缘关系通过未校正的AJK统计量计算,使用vcftools软件完成),最终剩余158个个体参与本研究。共鉴定出15038个SNP位点。
我们使用VCFTOOLS的–hwe选项移除偏离哈迪-温伯格平衡(Hardy–Weinberg equilibrium, HWE)的SNP位点(例如,仅针对多态位点的期望杂合度与观测杂合度开展精确检验,得到p值小于0.001)。在所有采样位点中,约6%的测序变异位点(904个变异)偏离了HWE。考虑到:(1)我们尤其关注可能与特定种群相关的SNP位点;(2)过滤HWE位点并未改变下文第1和第2小节所述的研究结果,因此我们保留本次上传的VCF文件中的全部15038个SNP位点。
创建时间:
2022-03-07



