DNA methylation dynamics during stress-response in woodland strawberry (Fragaria vesca)
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/6141712
下载链接
链接失效反馈官方服务:
资源简介:
Genome sequence and annotation of Fragaria vesca cv. Reine des Vallées
In order to generate a reference genome for Fragaria vesca cv. Reine des Vallées, we used MinIon long-read sequencing data to substitute the F. vesca genome v.4.0.a2 genome. The detailed method used to obtain these results were the following:
Genome sequencing and assembly NIL Fb2
Genomic DNA from strawberry plants was extracted by a Hexadecyltrimethylammonium bromide (Cetrimonium bromide, CTAB) modified protocol (Healey, Furtado, Cooper, & Henry, 2014) and purified with Agencourt AMPure XP beads (cat# A63880). Long-read sequencing was performed for the genome assembly; Genomic DNA by Ligation (Oxford Nanopore, cat# SQK-LSK109) library was prepared as described by the manufacturer and sequenced on a MinION for 72 h (Oxford Nanopore).
Reference genome polishing
Reads obtained from nanopore were filtered with Filtlong v0.2.1 (https://github.com/rrwick/Filtlong) using --min_mean_q 80 and --min_length 200. Cleaned reads were then aligned to the most recent version of the F. vesca genome v4.0.a2, downloaded from the Genome Database for Rosaceae (GDR) (https://www.rosaceae.org/species/fragaria_vesca/genome_v4.0.a2), using minimap2 v2.21 (H. Li, 2018) with parameters -aLx map-ont --MD -Y. The generated BAM file was then sorted and indexed with samtools v1.11 (H. Li et al., 2009). We used mosdepth v0.3.1 (Pedersen & Quinlan, 2018) to verify that coverage on chromosomic scaffolds was over 50 X. Sniffles v1.0.12a (Sedlazeck et al., 2018) with parameters -s 10 -r 1000 -q 20 --genotype -l 30 -d 1000 was used to detect structural variations larger than 30 bp. The VCF files obtained from Sniffles was sorted and filtered with BCFtools v1.14 (Danecek et al., 2021) to keep only structural variants (SV) with smaller than 200,00 bp (we observed that larger SV were most of the time false positive caused by misalignments in regions with gaps or Ns), supported by 10 or more reads and with allelic frequencies above 0.8 (we were interested in homozygous changes). The complete filtering command used is “bcftools view -q 0.8 -Oz -i '(SVTYPE = "DUP" || SVTYPE = "INS" || SVTYPE = "DEL" || SVTYPE = "TRA" || SVTYPE = "INV" || SVTYPE = "INVDUP") && %FILTER = "PASS" && FMT/DV>9 && SVLEN>29 && SVLEN<200000' “
From the VCF listing all the structural variants that we detected in our F. vesca accession, we generated a substituted genome version based on the reference F. vesca genome v.4.0.a2. The reference genome was first indexed with samtools faidx v1.11(Danecek et al., 2021) and a sequence dictionary was generated with Picard CreateSequenceDictionary v2.25.6 (https://broadinstitute.github.io/picard). The VCF containing the SV produced from our Nanopore sequencing was also indexed with gatk (Van der Auwera GA & O'Connor BD, 2020) IndexFeatureFile v4.2.0.0 (https://gatk.broadinstitute.org/hc/en-us/articles/360037262651-IndexFeatureFile). FastaAlternateReferenceMaker v4.2.0.0 (https://gatk.broadinstitute.org/hc/en-us/articles/360037594571-FastaAlternateReferenceMaker) was then run with the reference genome and the VCF file to generate a substituted genome representative of our Fragaria accession.
As substituting our genome with the detected structural variants changes genomic coordinates, we also corrected the public GFF genome annotation of F. vesca (Y, Pi, Gao, Liu, & Kang, 2019) using liftoff v1.6.1 (Shumate & Salzberg, 2021). Liftoff also detects and annotates duplications within the substituted genome.
Transposable elements annotation was carried out using the EDTA transposable element annotation pipeline v. 1.9.6 (S. Ou et al., 2019) on the substituted genome using default parameters.
Differentially methylated regions
The file Stress_vs_control_DMRs.zip file contains the DMRs that were called using the reads submitted to ENA (ERP135585) and obtained as follows:
First, bedGraph files from wgbs pipeline were pre-filtered for a minimum coverage of 5 reads using awk command. These output files were then used as input for the EpiDiverse/dmr bioinformatics analysis pipeline for non-model plant species to define DMRs (Nunn et al., 2021) with default parameters (minimum coverage threshold 5; maximum q-value 0.05; minimum differential methylation level 10%; 10 as minimum number of Cs; Minimum distance (bp) between Cs that are not to be considered as part of the same DMR is 146 bp). The pipeline uses metilene v.0.2.6.1 (https://www.bioinf.uni-leipzig.de/Software/metilene/) for pairwise comparison between groups and R-packages ggplot2 v.3.3.5 and gplots v.3.1.1, for visualization results (Fig. S1). Based on our F. vesca genome transcript annotation and methylation data (overlapped regions with DNA methylation cytosines and DMRs), we detected the methylated genes, promoters, 3’ UTRs, 5’UTR and transposable elements in strawberry. Global DNA methylation and DMR plots were performed with R-package ggplot2. Gene analyses by methylation patterns and analysis of per-family TE DNA methylation profiles were performed with deepTools v.3.5.0 (Ramírez et al., 2014). DMRs comparison between treatments were done by the Venn diagram v.1.7.0 R-package.
We produced several genome browsers tracks with DMRs that we integrated in our local instance of JBrowse available at the following url: https://jbrowse.agroscope.info/jbrowse/?data=fragaria_sub
创建时间:
2023-03-01



