A de novo reference genome assembly and annotation for the leafy vegetable Eruca sativa ('salad' rocket). Eruca sativa reference genome and annotation

NIAID Data Ecosystem2026-03-11 收录

下载链接：

https://www.ncbi.nlm.nih.gov/bioproject/PRJEB34051

下载链接

链接失效反馈

官方服务：

资源简介：

An inbred line of 'salad' rocket (Eruca sativa) was produced through single-seed descent for five generations. An individual plant was selected for next generation genome sequencing, grown under controlled environment conditions, and sampled at the first true-leaf stage. DNA was extracted and analysed for quality using the QC pipeline at the Earlham Institute (EI; Norwich, UK). ). De novo genome sequencing and assembly was performed at EI using PCR free paired-end (PE; 116.5x coverage) and LMP sequencing (37x coverage). One PCR free PE library was constructed from gDNA, and sequenced (Illumina HiSeq2500; 250 bp PE reads). LMP sequencing was also conducted (Illumina MiSeq; 250 bp PE reads). After data QC and assembly of the high coverage PE library, LMP libraries were mapped to determine their suitability for assembly improvement. Three additional libraries were selected and re-sequenced to a higher depth of coverage. FASTQ files were converted to BAM format using PicardTools and then assembled using DISCOVAR de novo sequence assembler. LMP libraries were processed using NextClip to analyse and create a high quality read subset for scaffolding the DISCOVAR-assembled sequences. SOAP and SSPACE were used to scaffold the DISCOVAR assembly using data from three of the NextClip-processed LMP read libraries.Annotation was performed by Novogene Co. Ltd. (Hong Kong). A homology and de novo-based approach was taken in order to identify TEs. The homology-based approach used known repetitive sequence databases. De novo repeat libraries were created using LTR_FINDER, RepeatScout, and RepeatModeler. An integrated approach was taken to compute consensus gene structures. The homology-based approached used the related genomes of A. lyrata, A. thaliana, B. napus, B. stricta, C. rubella, and R. sativus to compare against E. sativa to find homologous sequences, and predict gene structures. Ab initio statistical models were used to predict genes and their intron-exon structures (Augustus, GlimmerHMM, and SNAP). EVidenceModeler (EVM) software was then used to combine ab initio predictions, protein and transcript alignments, and RNAseq data into weighted consensus gene structures. Lastly, PASA was used to update the consensus predictions by adding UTR annotations and models for alternative splicing isoforms. All predicted proteins were functionally annotated using alignments to SwissProt, TrEMBL, KEGG, and InterPro.The reference genome sequence has been generated and used as part of a BBSRC LINK project (BB/N01894X/1) to improve genetic resources and knowledge of phytochemical metabolism in E. sativa.

创建时间：

2020-03-23