Historical museum samples enable the examination of divergent and parallel evolution during invasion
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.dbrv15f2v
下载链接
链接失效反馈官方服务:
资源简介:
During the Anthropocene, Earth has experienced unprecedented habitat loss, native species decline, and global climate change. Concurrently, greater globalisation is facilitating species movement, increasing the likelihood of alien species establishment and propagation. There is a great need to understand what influences a species’ ability to persist or perish within a new or changing environment. Examining genes that may be associated with a species’ invasion success or persistence informs invasive species management, assists with native species preservation, and sheds light on important evolutionary mechanisms that occur in novel environments. This approach can be aided by coupling spatial and temporal investigations of evolutionary processes. Here we use the common starling, Sturnus vulgaris, to identify parallel and divergent evolutionary change between contemporary native and invasive range samples and their common ancestral population. To do this, we use reduced-representation sequencing of native samples collected recently in north-western Europe and invasive samples from Australia, together with museum specimens sampled in the UK during the mid-19th Century. We found evidence of parallel selection on both continents, possibly resulting from common global selective forces such as exposure to pollutants. We also identified divergent selection in these populations, which might be related to adaptive changes in response to the novel environment encountered in the introduced Australian range. Interestingly, signatures of selection are equally as common within both invasive and native range contemporary samples. Our results demonstrate the value of including historical samples in genetic studies of invasion and highlight the ongoing and occasionally parallel role of adaptation in both native and invasive ranges.
Methods
DArTseq of 85 sturnus vulgaris samples.
Excerpt from manuscript pertaining to variant calling and filtering for attached files.
Variant calling:
We used the STACKS v2.2 pipeline to process the DArTseq raw data. We used the process_radtags function to clean the tags; discarding reads of low quality (-q), removing reads with uncalled bases (-c), and rescuing barcodes and radtags (-r). We used the Burrows-Wheeler aligner (BWA) v0.7.15 aln function to align the read data to the reference genome S. vulgaris vAU1.0. Using FastQC, we identified base sequence bias in the adapter region, and so the first five bases were trimmed (-B 5) during alignment. The reads were then processed through BWA samse and SAMTOOLS v1.10, before SNP variants were called through STACKS gstacks (default parameters) and then populations (parameter information below).
bwaaln_filter_allsample_rr_nofamily.recode.vcf file:
We generated a ‘population genetics’ variant file by running STACKS populations, filtering for a minimum per-population site call rate of 50% (-r 0.5), a minimum populations per-site of 2 (-p 2), a minimum loci log likelihood value of -15 (--lnl_lim -15), with one random SNP per tag retained (--write_random_snp). We used VCFTOOLS v0.1.16 to filter the following parameters: maximum missingness per site of 10% (-max-missing 0.9), minor allele frequency of 2.5% (MAF; --maf 0.025), minimum loci depth of 2 (--minDP 2), minimum genotype quality score of 15 (--minGQ 15) and site Hardy-Weinberg Equilibrium exact test minimum p value 0.001 (--hwe 0.001). We chose a high threshold for missingness to not bias the population genetics analysis against the historical samples, which had much higher levels of missingness than the contemporary samples. MAF filtering helps remove misreads, and HWE filtering removed highly non-neutral loci, both of which are important for capturing neutral population substructure. After filtering, we calculated individual relatedness, and closely related individuals were removed so that there was only one representative from each cluster in the final data. This resulted in a population genetic variant file of 3,840 SNPs used in the subsequent section ‘Population structure analysis’.
bwaaln_allsample_selection_histSNPs_maf025.recode.vcf file:
We generated a ‘selection’ variant file by using STACKS populations to align the raw reads for all samples (with --lnl_lim -15 --write_random_snp flags) and then used VCFTOOLS to filter out only SNPs present in at least 50% of the historical individuals (i.e. in at least 5 historical individuals), with additional quality filtering (--minGQ 15 --minDP 2), resulting in 12,219 SNP sites. Only these sites were then retained to filter the original populations variant file, along with a MAF minimum of 2.5% to remove possible sequencing errors. This produced a data set that retained only SNPs sequenced in at least half the historical individuals, which would be necessary for selection analysis.
在人类世(Anthropocene)时期,地球经历了前所未有的栖息地丧失、本土物种衰退与全球气候变化。与此同时,日益深化的全球化推动了物种迁移,提升了外来物种定植与扩散的概率。当前亟需明确哪些因素会影响物种在全新或剧变环境中的存续或消亡。探究与物种入侵成功或存续相关的基因,既能为外来物种管理提供依据、助力本土物种保护,也能揭示全新环境中发生的关键进化机制。结合空间与时间维度的进化过程研究,可进一步推进此类研究。本研究以家八哥(Sturnus vulgaris)为研究对象,旨在鉴定当代原生境与入侵境样本及其共同祖先种群间的平行与分化进化变化。为此,我们对近期采自欧洲西北部的原生境样本、采自澳大利亚的入侵境样本,以及19世纪中期英国馆藏标本开展了简化基因组测序(reduced-representation sequencing)。研究结果在两大洲均发现了平行选择的信号,这可能源自共同的全球性选择压力,例如污染物暴露。此外,我们还鉴定出这些种群间的分化选择,其可能与入侵澳大利亚后面对全新环境产生的适应性变化相关。值得注意的是,选择信号在当代入侵境与原生境样本中同样普遍存在。本研究结果证实了在入侵遗传学研究中纳入历史样本的价值,并凸显了适应过程在原生境与入侵境种群中均持续发生,且偶尔会呈现平行演化的特征。
方法
对85份家八哥(Sturnus vulgaris)样本开展DArTseq测序。
以下为手稿中与附件文件的变异识别与过滤相关的节选内容。
变异识别流程:
我们使用STACKS v2.2流程处理DArTseq原始数据:首先通过process_radtags函数对标签进行质控,丢弃低质量reads(-q参数)、移除含未调用碱基的reads(-c参数),并挽救条形码与RAD标签(-r参数)。随后使用Burrows-Wheeler比对工具(BWA)v0.7.15的aln函数将测序reads比对至家八哥参考基因组S. vulgaris vAU1.0。通过FastQC检测到接头区域存在碱基序列偏差,因此在比对过程中对前5个碱基进行了修剪(-B 5)。之后,通过BWA samse与SAMTOOLS v1.10处理reads,再通过STACKS的gstacks(默认参数)与populations(参数见下文)调用单核苷酸多态性(SNP, Single Nucleotide Polymorphism)变异。
bwaaln_filter_allsample_rr_nofamily.recode.vcf 文件:
我们通过运行STACKS populations生成「群体遗传学」变异文件,过滤参数设置为:每个种群的位点最低检出率为50%(-r 0.5)、每个位点的最低种群数为2(-p 2)、位点的最低对数似然值为-15(--lnl_lim -15),并保留每个标签上的一个随机SNP(--write_random_snp)。随后使用VCFTOOLS v0.1.16进行额外过滤:每位点的最大缺失率为10%(-max-missing 0.9)、次要等位基因频率(MAF, minor allele frequency)为2.5%(--maf 0.025)、位点的最低测序深度为2(--minDP 2)、基因型最低质量值为15(--minGQ 15),以及位点哈迪-温伯格平衡(Hardy-Weinberg Equilibrium)精确检验的最低p值为0.001(--hwe 0.001)。我们选择较高的缺失率阈值,以避免因历史样本的缺失率远高于当代样本而对群体遗传学分析造成偏倚。MAF过滤有助于去除测序错误,HWE过滤则可移除高度非中性位点,二者均对捕捉中性群体亚结构至关重要。过滤完成后,我们计算了个体间的亲缘关系,并移除了亲缘关系较近的个体,最终数据中每个聚类仅保留一个代表样本。最终得到包含3840个SNP的群体遗传变异文件,用于后续「群体结构分析」章节。
bwaaln_allsample_selection_histSNPs_maf025.recode.vcf 文件:
我们通过STACKS populations处理所有样本的原始reads(设置--lnl_lim -15 --write_random_snp参数),生成「选择分析」变异文件,随后使用VCFTOOLS过滤出至少在50%的历史样本中存在的SNP(即至少在5份历史样本中检出),并附加质量过滤参数(--minGQ 15 --minDP 2),最终得到12219个SNP位点。仅保留这些位点以过滤原始群体变异文件,同时设置MAF最低阈值为2.5%以去除潜在的测序错误。最终生成的数据集仅保留了至少在半数历史样本中被测序到的SNP,这是选择分析的必要前提。
创建时间:
2022-10-24



