Cannabis Pangenome Scaffolded Genomes
收藏plus.figshare.com2024-05-30 更新2025-03-25 收录
下载链接:
https://plus.figshare.com/articles/dataset/Cannabis_Pangenome_Scaffolded_Genomes/25889050/1
下载链接
链接失效反馈官方服务:
资源简介:
AbstractCannabis sativa is a globally significant seed-oil, fiber, and drug-producing plant species. However, a century of prohibition has severely restricted legal breeding and germplasm resource development, leaving potential hemp-based nutritional and fiber applications unrealized. Existing cultivars are highly heterozygous and lack competitiveness in the overall fiber and grain markets, relegating hemp to less than 200,000 hectares globally1. The relaxation of drug laws in recent decades has generated widespread interest in expanding and reincorporating cannabis into agricultural systems, but progress has been impeded by the limited understanding of genomics and breeding potential. No studies to date have examined the genomic diversity and evolution of cannabis populations using haplotype-resolved, chromosome-scale assemblies from publicly available germplasm. Here we present a cannabis pangenome, constructed with 181 new and 12 previously released genomes from a total of 156 biological samples from both male (XY) and female (XX) plants, including 42 trio phased and 36 haplotype-resolved, chromosome-scale assemblies. We discovered widespread regions of the cannabis pangenome that are surprisingly diverse for a single species, with high levels of genetic and structural variation, and propose a novel population structure and hybridization history. Conversely, the cannabinoid synthase genes contain very low levels of diversity, despite being embedded within a variable region containing multiple pseudogenized paralogs and distinct transposable element arrangements. Additionally, we identified variants of acyl-lipid thioesterase (ALT) genes2 that are associated with fatty acid chain length variation and the production of the rare cannabinoids, tetrahydrocannabinol varin (THCV) and cannabidiol varin (CBDV). We conclude the Cannabis sativa gene pool has only been partially characterized, and that the existence of wild relatives in Asia remains likely, while its potential as a crop species remains largely unrealized.1. Nions, U. Commodities at a glance: Special issue on industrial hemp. Commod Glance (2023) doi:10.18356/9789210019958.2. Pulsifer, I. P. et al. Acyl-lipid thioesterase1-4 from Arabidopsis thaliana form a novel family of fatty acyl-acyl carrier protein thioesterases with divergent expression patterns and substrate specificities. Plant Mol. Biol. 84, 549–563 (2014).Pangenome assembly and scaffoldingAll genomes labeled Hifiasm_HiC, Hifiasm_Trio_RagTag, Hifiasm_RagTag, and Hifiasm (Supplementary Table 1) were assembled using Hifiasm v0.16.11. When available, HiC data and HiFi parental trio data were also incorporated into the assembly process defining the Hifiasm_HiC and Hifiasm_Trio_RagTag types respectively. CLR (continuous long reads) assemblies were generated using FALCON Unzip from PacBio SMRT Tools 9.0 Suite 15 and CCS (circular consensus sequencing) labeled genomes were assembled with HiCanu v2.2 16. After assembly, HiC reads were aligned to the Hifiasm_HiC contigs using the Juicer v1.6.2 pipeline2 followed by ordering and orientation utilizing version 180922 of the 3D-DNA pipeline3. The scaffolded assemblies were then manually corrected using Juicebox v1.11.084. Hifiasm_RagTag and Hifiasm_Trio_RagTag assemblies were scaffolded using the split chromosomes of the 24 HiC scaffolded genomes and error checked with yak-0.1 (github.com/lh3/yak). Sourmash v4.6.117 was used to generate a Jaccard similarity matrix between the chromosomes and each un-scaffolded assembly, and the most similar version of chromosome 1 through X was concatenated to generate a reference for scaffolding via RagTag v2.1.018. If the similarity matrix identified the Y chromosome as the best match, the assembly remained un-scaffolded. BUSCO v5.4.3 21 with the eudicots_odb10 dataset and assembly-stats v1.0.1 (https://github.com/sanger-pathogens/assembly-stats) were used on all assemblies to measure completeness and contiguity.1. Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).2. Durand, N. C. et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst 3, 95–98 (2016).3. Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).4. Durand, N. C. et al. Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell Syst 3, 99–101 (2016).15. Chin, C.-S. et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods 13, 1050–1054 (2016).16. Nurk, S. et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 30, 1291–1305 (2020).17. Titus Brown, C. & Irber, L. sourmash: a library for MinHash sketching of DNA. J. Open Source Softw. 1, 27 (2016).18. Alonge, M. et al. Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biol. 23, 258 (2022).21. Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker Open-4.0. 2013--2015. Preprint at (2015).
大麻(Cannabis sativa)作为一种全球重要的种子油、纤维和药物生产的植物种类,其重要性不容忽视。然而,一个世纪的禁令严重限制了其合法育种和种质资源的发展,导致以大麻为基础的营养和纤维应用未能实现。现有的品种高度杂合,在纤维和谷物市场中缺乏竞争力,使得全球大麻种植面积不足20万公顷1。近年来,随着毒品法律的放宽,人们对将大麻重新纳入农业系统并扩大其种植范围产生了广泛的兴趣,但进展受到对基因组学和育种潜力有限理解的阻碍。迄今为止,尚未有研究使用从公共种质资源中获取的等位基因解析的染色体规模组装来考察大麻群体的基因组多样性和进化。在此,我们呈现了一个大麻泛基因组,该泛基因组由181个新发布和12个先前发布的基因组构建而成,总共来自156个生物样本,包括雄性(XY)和雌性(XX)植物,其中包含42个三倍体相位的和36个等位基因解析的染色体规模组装。我们发现,大麻泛基因组中广泛存在的区域在单一物种中表现出惊人的多样性,具有高水平的遗传和结构变异,并提出了新的种群结构和杂交历史。相反,大麻素合成酶基因在包含多个假基因化的同源基因和不同的转座元件排列的可变区域内,却含有非常低的多样性。此外,我们还鉴定了与脂肪酸链长度变异以及稀有大麻素四氢大麻酚变体(THCV)和大麻二酚变体(CBDV)生产相关的酰基脂硫酯酶(ALT)基因2。我们得出结论,大麻(Cannabis sativa)基因库仅被部分表征,亚洲野生近缘种的存在可能性较大,而其作为作物种类的潜力尚未得到充分实现。1. Nions, U. Commodities at a glance: Special issue on industrial hemp. Commod Glance (2023) doi:10.18356/9789210019958.2. Pulsifer, I. P. et al. Acyl-lipid thioesterase1-4 from Arabidopsis thaliana form a novel family of fatty acyl-acyl carrier protein thioesterases with divergent expression patterns and substrate specificities. Plant Mol. Biol. 84, 549–563 (2014).泛基因组组装和构建所有标记为Hifiasm_HiC、Hifiasm_Trio_RagTag、Hifiasm_RagTag和Hifiasm的基因组都使用Hifiasm v0.16.11进行组装。当可用时,HiC数据和HiFi亲本三倍体数据也被纳入组装过程,分别定义了Hifiasm_HiC和Hifiasm_Trio_RagTag类型。使用PacBio SMRT Tools 9.0 Suite 15中的FALCON Unzip从连续长读(CLR)组装中生成组装,使用HiCanu v2.2 16对环状一致性测序(CCS)标记的基因组进行组装。组装完成后,使用Juicer v1.6.2管道2将HiC读序列对齐到Hifiasm_HiC连续片段,然后利用3D-DNA管道3的180922版本进行排序和定向。然后,使用Juicebox v1.11.084手动校正构建的组装。Hifiasm_RagTag和Hifiasm_Trio_RagTag组装使用24个HiC构建的基因组中的分裂染色体进行构建,并使用yak-0.1(github.com/lh3/yak)进行错误检查。使用Sourmash v4.6.11生成染色体和每个未构建组装之间的Jaccard相似性矩阵,然后将1号染色体到X号染色体的最相似版本连接起来,通过RagTag v2.1.0生成构建的参考。如果相似性矩阵识别Y染色体为最佳匹配,则组装保持未构建。使用eudicots_odb10数据集的BUSCO v5.4.3 21和assembly-stats v1.0.1(https://github.com/sanger-pathogens/assembly-stats)对所有组装进行测量,以衡量完整性和连续性。1. Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).2. Durand, N. C. et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst 3, 95–98 (2016).3. Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).4. Durand, N. C. et al. Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell Syst 3, 99–101 (2016).15. Chin, C.-S. et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods 13, 1050–1054 (2016).16. Nurk, S. et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 30, 1291–1305 (2020).17. Titus Brown, C. & Irber, L. sourmash: a library for MinHash sketching of DNA. J. Open Source Softw. 1, 27 (2016).18. Alonge, M. et al. Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biol. 23, 258 (2022).21. Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker Open-4.0. 2013--2015. Preprint at (2015).
提供机构:
plus.figshare.com



