Adaptive potential of Coffea canephora from Uganda in response to climate change

NIAID Data Ecosystem2026-03-14 收录

下载链接：

http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.6t1g1jx0m

下载链接

链接失效反馈

官方服务：

资源简介：

Understanding vulnerabilities of plant populations to climate change could help preserve their biodiversity and reveal new elite parents for future breeding programs. To this end, landscape genomics is a useful approach for assessing putative adaptations to future climatic conditions, especially in long-lived species such as trees. We conducted a population genomics study of 207 Coffea canephora trees from seven forests along different climate gradients in Uganda. For this, we sequenced 323 candidate genes involved in key metabolic and defense pathways in coffee. Seventy-one SNPs were found to be significantly associated with bioclimatic variables, and were thereby considered as putatively adaptive loci. These SNPs were linked to key candidate genes, including transcription factors, like DREB-like and MYB family genes controlling plant responses to abiotic stresses, as well as other genes of organoleptic interest, like the DXMT gene involved in caffeine biosynthesis and a putative pest repellent. These climate-associated genetic markers were used to compute genetic offsets, predicting population responses to future climatic conditions based on local climate change forecasts. Using these measures of maladaptation to future conditions, substantial levels of genetic differentiation between present and future diversity were estimated for all populations and scenarios considered. The populations from the forests Zoka and Budongo, in the northernmost zone of Uganda, appeared to have the lowest genetic offsets under all predicted climate change patterns, while populations from Kalangala and Mabira, in the Lake Victoria region, exhibited the highest genetic offsets. The potential of these findings in terms of ex-situ conservation strategies are discussed. Methods Study species and sample selection: Uganda is divided into sixteen climate zones based on precipitation patterns as defined by Basalirwa (1995), five of which host C. canephora stands. Within these five climate zones, 207 georeferenced trees were sampled from seven wild forests in 2012 and 2014 by the National Agricultural Research Organization (NARO, Uganda) and collaborators of the Institut de Recherche pour le Développement (IRD, Montpellier, France). These forests include: Budongo (n=65), Itwara (n=23), Kibale (n=19), Kalangala (n=10), Mabira (n=25), Malabigambo (n=16) and Zoka (n=49). Populations in Zoka, Budongo, Kalangala, Mabira and Malabigambo occurred in distinct climatic envelopes, while the climatic envelopes in Itwara tended to overlap those of Kibale (Kiwuka et al., 2021). In each targeted forest, leaf samples were collected from five sub-sites that were separated by distances of at least 5 km. Selection of candidate genes and bait design: The 323 candidate genes (CGs) selected for the present study have been annotated and/or functionally characterized in previous studies. They all code for candidate proteins already reported to play important roles in central metabolism or in plant responses and adaptation to abiotic stress. The CG sequences were retrieved from the whole genome assembly of C. canephora (Denoeud et al., 2014) according to the annotation available on the Coffee Genome Hub (http://coffee-genome.org/) (Dereeper et al., 2015). Probes were designed to cover each CG coding region as well as 1 kb upstream and 500 bp downstream flanking regions, so as to include putatively regulatory regions. The 120 bp MyBaits® probes were designed with 2X tiling and synthesized by MYcroarray provider (Ann Arbor, Michigan, USA). A total of 21,306 probes were designed. Each candidate probe was BLASTed against the C. canephora genome (Denoeud et al., 2014) and filtered based on the manufacturer’s stringent criteria (Mariac et al., 2022). Library preparation and sequencing: DNA extractions for the 207 samples were performed at the IRD facilities from silica-gel dried leaves according to a previously described protocol (Mariac et al., 2006). Genomic libraries were constructed using the protocols outlined in Rohland & Reich (2012) and Mariac et al. (2014). The 207 individual libraries were then capture-enriched by pools of 48 libraries using the synthetic RNA MyBaits® probes and according to the MYcroarray protocol (Mariac et al., 2022). The enriched pools were quantified using real-time PCR and combined in equimolar ratios prior to sequencing on one lane of 150 bp paired end reads on an Illumina HiSeq 3000 sequencer (GeT-PlaGe Platform, GenoToul, Toulouse, France). SNP genotyping, calling and filtering: Sequence analysis was performed using scripts published by Mariac et al. (2014) and Scarcelli et al. (2016) and also available on GitHub (https://github.com/Maillol/demultadapt; https://github.com/SouthGreenPlatform/arcad-hts/blob/master/scripts/arcad_hts_2_Filter_Fastq_On_Mean_Quality.pl). The mapping step was carried out using BWA MEM 0.7.5a-r405 (Li & Durbin, 2009) with the default option (-B 4) and the C. canephora assembly (http://coffee-genome.org/coffeacanephora) as reference. SNP calling was done using UnifiedGenotyper in the Genome Analysis Toolkit (GATK v3.6). SNPs located on the selected CG sequences were considered as ‘in-target’ and the other ones as ‘off-target’. Two successive sets of filters were applied to raw SNPs. We first discarded low quality variants according to the quality criteria recommended by GATK, and selected only biallelic SNPs using VCFtools v0.1.13 (Danecek et al., 2011). We applied additional filters for population genetic analyses and for association analyses, i.e. keeping SNPs with no excess of heterozygous genotypes (< 0.8), a minor allele frequency (MAF) greater than 5% and under linkage equilibrium. For the latter filter, SNPs were processed with PLINK 1.90b4 (Purcell et al., 2007) to prune only SNPs in approximate linkage equilibrium based on the pairwise correlation between the SNP genotype counts for 100 bp sliding windows with 10 bp steps (option -indep-pairwise). The SNPs were considered correlated when r2 > 0.5. These filters led to a total of 5,860 SNPs: 4,753 in-target and 1,107 off-target loci. Bioclimatic data: Environmental factors (bioclimatic variables BIO1-19, Table S1) were downloaded from the WorldClim database (http://www.worldclim.org, Fick & Hijmans, 2017) at 30 arc-second resolution (~1 km) for ‘Current conditions ~1960-2000’

解析植物种群对气候变化的脆弱性，有助于保护生物多样性，并为未来育种计划发掘优异亲本。为此，景观基因组学（landscape genomics）是评估物种对未来气候条件潜在适应性的有效手段，对于树木等长寿物种尤为如此。本研究针对乌干达沿不同气候梯度分布的7片森林中的207株小果咖啡（Coffea canephora）开展种群基因组学研究。为此，我们对咖啡中参与关键代谢与防御通路的323个候选基因进行测序。共筛选得到71个与生物气候变量显著关联的单核苷酸多态性（Single Nucleotide Polymorphism，SNP），这些位点被视为潜在适应性位点。这些SNP关联到多个关键候选基因，包括调控植物非生物胁迫响应的转录因子（如DREB类、MYB家族基因），以及其他与感官品质相关的基因——例如参与咖啡因生物合成的DXMT基因，以及潜在的害虫驱避基因。我们利用这些与气候关联的遗传标记计算遗传偏移量，基于当地气候变化预测模型，推演种群对未来气候条件的响应。基于这些衡量种群对未来气候适应不良程度的指标，本研究估算了所有研究种群在各预测情景下，当前与未来遗传多样性间的显著遗传分化水平。乌干达最北部的佐卡（Zoka）与布东戈（Budongo）森林种群，在所有气候变化预测情景下均呈现最低的遗传偏移量；而维多利亚湖区域的卡兰加拉（Kalangala）与马比拉（Mabira）种群则表现出最高的遗传偏移量。本研究最后讨论了上述发现对迁地保护（ex-situ conservation）策略的应用潜力。研究方法研究物种与样品选择根据Basalirwa（1995）定义的降水模式，乌干达被划分为16个气候区，其中5个气候区分布有小果咖啡（C. canephora）林分。在这5个气候区内，乌干达国家农业研究组织（National Agricultural Research Organization, NARO, 乌干达）与法国蒙彼利埃发展研究所（Institut de Recherche pour le Développement, IRD, 法国蒙彼利埃）的合作者于2012年和2014年从7片野生森林中采集了207株带有地理坐标的植株样品。这7片森林分别为：布东戈（Budongo, n=65）、伊特瓦拉（Itwara, n=23）、基巴莱（Kibale, n=19）、卡兰加拉（Kalangala, n=10）、马比拉（Mabira, n=25）、马拉比甘博（Malabigambo, n=16）以及佐卡（Zoka, n=49）。佐卡、布东戈、卡兰加拉、马比拉与马拉比甘博种群的气候生态位均存在显著差异，而伊特瓦拉种群的气候生态位与基巴莱种群存在重叠（Kiwuka等，2021）。在每片目标森林中，我们从至少相距5 km的5个亚样点采集叶片样品。候选基因筛选与诱饵探针设计本研究选取的323个候选基因（CGs）均已在既往研究中完成注释和/或功能验证。这些基因均编码已被报道在核心代谢、植物响应及非生物胁迫适应中发挥关键作用的候选蛋白。我们根据咖啡基因组数据库（Coffee Genome Hub, http://coffee-genome.org/, Dereeper等，2015）中的注释信息，从已发表的小果咖啡全基因组组装结果（Denoeud等，2014）中获取了这些候选基因的序列。探针设计覆盖每个候选基因的编码区，以及上下游各1 kb和500 bp的侧翼区域，以纳入潜在的调控区域。本研究采用2倍tiled覆盖密度设计了长度为120 bp的MyBaits®探针，并由美国密歇根州安阿伯市的MYcroarray公司合成，共设计得到21306条探针。每条候选探针均与小果咖啡基因组（Denoeud等，2014）进行BLAST比对，并根据厂商提供的严格筛选标准进行过滤（Mariac等，2022）。文库构建与测序我们依托IRD实验室的设施，按照已发表的实验方案（Mariac等，2006），从硅胶干燥的叶片中提取了207份样品的基因组DNA。基因组文库构建参考了Rohland & Reich（2012）与Mariac等（2014）发表的实验流程。随后，我们将207个独立文库以每48个文库为一组，使用合成的RNA型MyBaits®探针，并按照MYcroarray公司的实验流程（Mariac等，2022）进行捕获富集。富集后的文库混合液通过实时荧光定量PCR进行定量，并按等摩尔比混合，随后在Illumina HiSeq 3000测序仪（法国图卢兹GenoToul的GeT-PlaGe平台）上进行150 bp双端测序，每个混合文库占用一个测序泳道。 SNP基因分型、识别与过滤序列分析采用Mariac等（2014）与Scarcelli等（2016）发表的脚本完成，相关脚本可在GitHub平台获取：https://github.com/Maillol/demultadapt; https://github.com/SouthGreenPlatform/arcad-hts/blob/master/scripts/arcad_hts_2_Filter_Fastq_On_Mean_Quality.pl。序列比对步骤采用BWA MEM 0.7.5a-r405软件（Li & Durbin, 2009）完成，使用默认参数（-B 4），并以小果咖啡基因组组装结果（http://coffee-genome.org/coffeacanephora）作为参考基因组。SNP识别采用基因组分析工具包（Genome Analysis Toolkit, GATK）v3.6中的UnifiedGenotyper模块完成。位于所选候选基因序列上的SNP被定义为“靶区内位点”，其余则为“靶区外位点”。我们对原始SNP数据集进行了两轮连续过滤：首先根据GATK推荐的质量标准剔除低质量变异，随后使用VCFtools v0.1.13（Danecek等，2011）仅保留双等位基因SNP。针对种群遗传分析与关联分析，我们增设了额外过滤条件：保留杂合基因型频率不超过0.8、次要等位基因频率（minor allele frequency, MAF）大于5%且符合连锁平衡的SNP。针对连锁平衡过滤条件，我们使用PLINK 1.90b4软件（Purcell等，2007）进行SNP修剪：以100 bp为滑动窗口、10 bp为步长，基于SNP基因型计数的两两相关性进行筛选，当r²>0.5时判定两个SNP存在连锁关联。经过上述过滤，最终得到5860个SNP位点，其中4753个为靶区内位点，1107个为靶区外位点。生物气候数据我们从WorldClim数据库（http://www.worldclim.org, Fick & Hijmans, 2017）下载了分辨率为30弧秒（约1 km）的生物气候变量（BIO1-19，见表S1），用于表征1960-2000年的当前气候条件。

创建时间：

2022-10-12

5,000+

优质数据集

54 个

任务类型

进入经典数据集