Something old, something new: evolution of Colombian weedy rice (Oryza spp.) through de novo de-domestication, exotic gene flow, and hybridization
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.sf7m0cg2q
下载链接
链接失效反馈官方服务:
资源简介:
Weedy rice (Oryza spp.) is a worldwide weed of domesticated rice (O. sativa), considered particularly problematic due to its strong competition with the crop, which leads to reduction of yields and harvest quality. Several studies have established multiple independent origins for weedy rice populations in the U.S. and various parts of Asia; however, the origins of weedy rice in South America have not been examined in a global context. We evaluated the genetic variation of weedy rice populations in Colombia, as well as the contributions of local wild Oryza species, local cultivated varieties, and exotic Oryza groups to the weed, using polymorphism generated by genotyping by sequencing (GBS). We found no evidence for genomic contributions from local wild Oryza species (O. glumaepatula, O. grandiglumis, O. latifolia and O. alta) to Colombian weedy rice. Instead, Colombian weedy rice has evolved from local indica cultivars, and has also likely been inadvertently imported as an exotic pest from the US. Additionally, weeds comprising de novo admixture between these distinct weedy populations now represent a large proportion of genomic backgrounds in Colombian weedy rice. Our results underscore the impressive ability of weedy rice to evolve through multiple evolutionary pathways, including in situ de-domestication, range expansion, and hybridization.
Methods
Plant material and DNA extraction
Weedy rice seeds were collected by sampling in the five principal rice production areas of the country in 2014-2015. Twenty-six accessions were collected in the Central zone, 34 in the Llanos plains, 28 in the Bajo Cauca river valley, 26 in the North coast and 26 in the Santanderes area, for a total of 140 Colombian weedy rice (CWR) accessions. These were additionally classified according to hull color (lemma and palea), pericarp color, grain size, and awn presence, abundance and length. Nine varieties of commercial rice were supplied by the National Federation of Rice Growers (Fedearroz), which included: four varieties that are currently cultivated and have been in the market for 10 to 17 years (F473 cultivated in the low Cauca valley area, F2000 in Santanderes and North coast zones, F174 in the Eastern plains and F60 in the Central zone), one variety with national relevance and cultivated for approximately 17 years (F50), and four varieties that have left the market but were important over a 25 year period (Cica 6, Cica 8 and Cica 9 and Orizica1). Additionally, 19 landraces (traditional crop with manual dry seeding system, present in the low Cauca valley area) that belong to the germplasm bank of the Fedearroz Monteria section were sampled. We also obtained seeds of 22 wild South America Oryza from the International Rice Research Institute (IRRI); these included seven O. glumaepatula, six O. grandiglumis, five O. alta, and four O. latifolia.
Staggered sowing of all selected materials (CWR, commercial rice and landraces) was carried out in the Weed Science greenhouse at Universidad Nacional de Colombia. In order to break dormancy, weeds were exposed to 50°C for 12 hours (materials that did not germinate under this treatment were kept at 50°C for 72 hours). South American wild rice materials were planted in the Biology greenhouse at the University of Massachusetts, following a dormancy breaking treatment at 50°C for 5 days. 200 mg of young leaf tissue were used for DNA extraction with the Qiagen DNeasy® Plant Mini Kit. The concentration and purity of the DNA in the samples was quantified with a Qubit 2.0 fluorometer; 30 μl of DNA with a concentration of 30-100 ng/μl for each sample was sent to the Cornell University Biotechnology Resource Center (BRC) for genotyping by sequencing (GBS).
GBS library preparation and sequence analysis
GBS was used to detect polymorphisms distributed throughout the genome among our samples (Elshire et al., 2011). Restriction digestions were carried out with the enzyme ApeKI, and the fragments were ligated with individual barcoded and common adapters. DNA fragments were pooled for further PCR amplification to enrich the libraries. Single end fragments of 100 base pairs (bp) were sequenced on an Illumina HiSeq 2500 platform. Raw GBS data has been deposited in the NCBI SRA (experiment SUB6244405). Initial data processing was performed at the Biotechnology Institute of Cornell University with the TASSEL-GBS v.3.0 pipeline (Glaubitz et al., 2014) and the MSU7 rice reference genome. The first filters removed SNPs with minimum minor allele frequency <0.01 or missing data per site >90%. Additional filters were applied to the initial VCF file with the NGSEP pipeline (Duitama et al., 2014) at the University of Massachusetts. Final SNPs were supported by at least five reads (for the analysis of Colombian material) or three reads (for the global analysis), and displayed heterozygosity levels of <50%. SNPs with more than 15% missing data and individuals with more than 70% missing data were removed. Only biallelic SNPs were retained, filtering any other type of variants.
Data generated for Colombian and South American samples was integrated with other published GBS databases that included 128 cultivars, 173 weedy rice samples and 53 samples of the wild ancestor of cultivated rice (O. rufipogon/O. nivara) from Asia (Huang et al., 2017; Vigueira et al., 2019), nine cultivars and 17 weedy rice samples from the United States (Burgos et al., 2014), and four outgroup samples (O. meridionalis and O. barthii) from these same datasets (NCBI Short Read Archive experiment SRX576894), for a total of 574 accessions .
Population analyses
Population structure was analyzed using STRUCTURE (version 2.3.3, Hubisz, Falush, Stephens, & Pritchard, 2009), on the Massachusetts Green High Performance Computing Cluster (http://www.mghpcc.org/). Due to the limitation in the amount of input data that can be handled by the program (Falush, Stephens, & Pritchard, 2003; Pritchard, Stephens, & Donnelly, 2000), approximately 10,000 SNPs were randomly selected for each analysis with a roughly 15,000 base pairs (bp) spacing. Heterozygotes were recorded as "N" and all the simulations were run with data coded as haploid, because weedy and cultivated rice are highly self-pollinated. The program was run for an ancestral "admixture" model, with no correlated allele frequencies. Runs were carried out using K values between 1 and 15, with three replicates per K, a burn-in period of 100,000 and 500,000 subsequent iterations. The optimal number of genetic groupings was determined using ΔK (Evanno, Regnaut, & Goudet, 2005) according to the program Structure Harvester (Earl & vonHoldt, 2012). The program CLUMPP (Jakobsson & Rosenberg, 2007) was used to obtain a single Q matrix for each K. The final matrix for each K value was visualized with Distruct (Rosenberg, 2004). For comparison, we also analyzed our complete SNP dataset for worldwide samples with the Bayesian clustering analysis fastStructure (version 1.0, Raj, Stephens, & Pritchard, 2014), without recoding heterozygotes, with no prior grouping. FastStructure runs were conducted for K from 2 to 8. The best K was determined through chooseK.py, and the POPHELPER R package was used to generate an image (Francis, 2017).
To investigate the genetic divergence among individuals for all SNPs, the program SmartPCA from the Eigensoft package (Patterson, Price, & Reich, 2006; Price et al., 2006) was used. Figures with eigenvalues as coordinates were generated by RStudio 1.0.143.
To infer the phylogenetic relationships among samples, RAxML (Randomized Axelerated Maximum Likelihood) version 8 (Stamatakis, 2014) was used. The RAxML HPC2 on XSEDE tool was selected in the CIPRES portal (http://www.phylo.org/), with GTRGAMMA model and 1000 bootstraps. Because SNP data only presents variable sites, ascertainment bias correction (ASC) was performed. The best phylogenetic tree result was plotted using iTol v4 (Letunic & Bork, 2016).
Genetic diversity for each population was measured by evaluating the expected heterozygosity calculated for all loci and paired FST was used to estimate the genetic differences among populations with the software ARLEQUIN (ver 3.5.2.2., Excoffier & Lischer, 2010). Additionally, an AMOVA was performed to analyze variation among and within populations.
### 研究背景与结果
杂草稻(Oryza spp.)是全球范围内危害栽培稻(O. sativa)的恶性杂草,因其与栽培稻竞争能力极强,会导致作物减产及收获品质下降,因此被视为极具危害性的杂草。已有多项研究证实,美国及亚洲多个地区的杂草稻种群存在多个独立起源;但目前尚未在全球框架下解析南美洲杂草稻的起源。
本研究利用测序分型(genotyping by sequencing, GBS)产生的多态性位点,对哥伦比亚杂草稻种群的遗传变异,以及当地野生稻属物种、当地栽培品种和外来稻属类群对该杂草的遗传贡献进行了分析。本研究未发现当地野生稻属物种(O. glumaepatula、O. grandiglumis、O. latifolia及O. alta)对哥伦比亚杂草稻存在基因组贡献的证据。反而,哥伦比亚杂草稻既起源于当地籼稻栽培品种,也可能作为外来有害生物从美国无意传入。此外,由这些不同杂草稻种群间全新杂交起源的杂草类群,目前已在哥伦比亚杂草稻的基因组背景中占据相当大的比例。本研究结果凸显了杂草稻通过多种进化路径演化的强大能力,包括原位去驯化、范围扩张及杂交事件。
## 材料与方法
### 1. 材料与DNA提取
2014-2015年,研究人员在哥伦比亚五大主要水稻产区采集杂草稻种子:中部地区26份、拉诺斯平原34份、下考卡河谷28份、北海岸地区26份、桑坦德地区26份,共计140份哥伦比亚杂草稻(Colombian weedy rice, CWR)种质。同时,根据颖壳颜色(外稃和内稃)、果皮颜色、籽粒大小以及芒的有无、密度和长度对这些种质进行了分类。
哥伦比亚全国水稻种植者联合会(National Federation of Rice Growers, Fedearroz)提供了9份商业水稻品种:其中4份为当前推广种植、已上市10~17年的品种(F473种植于下考卡河谷地区,F2000种植于桑坦德及北海岸地区,F174种植于东部平原,F60种植于中部地区);1份具有全国推广价值、种植时长约17年的品种(F50);以及4份已退市但在25年间具有重要种植地位的品种(Cica 6、Cica 8、Cica 9及Orizica1)。此外,研究人员还采集了保存在Fedearroz蒙特里亚种质资源库的19份地方品种(采用人工旱播种植体系的传统栽培稻,分布于下考卡河谷地区)。本研究还从国际水稻研究所(International Rice Research Institute, IRRI)获取了22份南美野生稻种子,包括7份O. glumaepatula、6份O. grandiglumis、5份O. alta及4份O. latifolia。
所有供试材料(哥伦比亚杂草稻、商业水稻品种及地方品种)的分期播种均在哥伦比亚国立大学杂草科学温室中完成。为打破种子休眠,杂草稻种子经50℃处理12小时;若经该处理后仍未萌发,则继续在50℃下处理72小时。南美野生稻材料则经50℃处理5天打破休眠后,于马萨诸塞大学生物学温室进行种植。取200mg幼嫩叶片组织,使用Qiagen DNeasy®植物微量提取试剂盒进行DNA提取。利用Qubit 2.0荧光分光光度计对样品DNA的浓度和纯度进行定量;将每份浓度为30~100 ng/μl的30μl DNA样品送至康奈尔大学生物技术资源中心(Biotechnology Resource Center, BRC)进行测序分型(GBS)。
### 2. GBS文库构建与序列分析
本研究利用测序分型技术(GBS)检测供试样本全基因组范围内的多态性位点(Elshire等,2011)。采用限制性内切酶ApeKI对基因组DNA进行酶切,随后将酶切片段与带有个体标签和通用接头的寡核苷酸序列连接。将连接后的DNA片段混合,通过PCR扩增富集文库。在Illumina HiSeq 2500平台上对100bp的单端测序片段进行测序。原始GBS数据已提交至NCBI序列读取档案库(NCBI SRA,实验编号SUB6244405)。
初始数据处理在康奈尔大学生物技术研究所完成,使用TASSEL-GBS v3.0分析流程(Glaubitz等,2014)及MSU7水稻参考基因组。第一轮过滤移除了最小等位基因频率<0.01或每个位点缺失率>90%的单核苷酸多态性(SNP)位点。随后,在马萨诸塞大学利用NGSEP分析流程(Duitama等,2014)对初始VCF文件进行额外过滤。最终保留的SNP位点需满足:哥伦比亚材料分析中至少有5条测序读段支持,全球分析中至少有3条测序读段支持,且杂合率<50%。同时移除缺失率>15%的SNP位点及缺失率>70%的样本,仅保留双等位基因SNP位点,过滤其他类型的变异。
本研究将哥伦比亚及南美样本的测序数据与已发表的其他GBS数据库进行整合,这些数据库包含:来自亚洲的128份栽培稻、173份杂草稻样本及53份栽培稻野生祖先种(O. rufipogon/O. nivara)样本(Huang等,2017;Vigueira等,2019);来自美国的9份栽培稻及17份杂草稻样本(Burgos等,2014);以及来自上述数据集的4份外类群样本(O. meridionalis和O. barthii,NCBI短读段档案库实验编号SRX576894),最终整合得到共计574份种质的数据集。
### 3. 群体遗传学分析
本研究利用STRUCTURE软件(v2.3.3,Hubisz等,2009)在马萨诸塞州绿色高性能计算集群(http://www.mghpcc.org/)上开展群体结构分析。鉴于该软件对输入数据量的限制(Falush等,2003;Pritchard等,2000),每次分析随机选取约10000个SNP位点,位点间间距约为15000bp。由于杂草稻和栽培稻均为高度自花授粉作物,将杂合基因型记为“N”,并将所有数据以单倍型编码进行模拟分析。本研究采用祖先“混合”模型,不考虑等位基因频率相关性。设置K值范围为1~15,每个K值设置3次重复,预烧期为100000次迭代,后续迭代次数为500000次。利用Structure Harvester软件(Earl & vonHoldt,2012),通过ΔK方法(Evanno等,2005)确定最优遗传分组数。使用CLUMPP软件(Jakobsson & Rosenberg,2007)为每个K值生成单一Q矩阵,并通过Distruct软件(Rosenberg,2004)可视化每个K值对应的最终矩阵。
为便于对比,本研究还利用贝叶斯聚类分析软件fastStructure(v1.0,Raj等,2014)对全球样本的完整SNP数据集进行分析,不对杂合基因型进行重新编码,且不设置先验分组。设置K值范围为2~8,通过chooseK.py脚本确定最优K值,并利用POPHELPER R包(Francis,2017)生成可视化图像。
为分析所有SNP位点下个体间的遗传分化,本研究使用Eigensoft软件包中的SmartPCA工具(Patterson等,2006;Price等,2006),并通过RStudio 1.0.143生成以特征值为坐标轴的可视化图。
为推断样本间的系统发育关系,本研究使用RAxML(Randomized Axelerated Maximum Likelihood)v8软件(Stamatakis,2014)。在CIPRES科学计算门户(http://www.phylo.org/)上选择RAxML HPC2 on XSEDE工具,采用GTRGAMMA模型并设置1000次自展重复。由于SNP数据仅包含变异位点,本研究进行了检测偏倚校正(ASC)。利用iTol v4工具(Letunic & Bork,2016)可视化最优系统发育树。
本研究利用ARLEQUIN软件(v3.5.2.2,Excoffier & Lischer,2010),通过计算所有位点的期望杂合度来评估各群体的遗传多样性,并利用成对FST值估算群体间的遗传分化。此外,本研究还通过分子方差分析(AMOVA)解析群体间及群体内的遗传变异。
创建时间:
2020-03-16



