Genome assembly of Olea europaea subsp. cuspidate

Mendeley Data2024-04-13 更新2024-06-27 收录

下载链接：

https://datadryad.org/stash/dataset/doi:10.5061/dryad.t4b8gtj42

下载链接

链接失效反馈

官方服务：

资源简介：

The sequencing for DNA and RNA molecules was based on an individual of OC sampled from Kunming Arboretum (N 25°9′13″, E 102°45′9″), Yunnan Academy of Forestry and Grassland, Yunnan province of China. Yong-Kang Sima identified it. Voucher specimen (Wu20056) was deposited in the Herbarium of Yunnan Academy of Forestry and Grassland. The standard preparing procedures before sequencing, including DNA and RNA extraction, Hi-C library construction, etc., were based on the requirements of specific sequencers. Totally, five tissues including leaves, roots, twigs, bark, and fruits, were used for RNAseq sequencing in Illumina platform. For DNAseq, ~50x genome short reads (300 bp PE) and ~70x Nanopore long-reads were obtained from DNBSEQ-T7 and PromethION platform, respectively. The raw reads were filtered using the fastp preprocessor. To achieve chromosome-level assembly, we further generated ~130Gb data of the paired-end Hi-C reads (150bp) from DNBSEQ-T7 platform (MGI). We conducted the karyotyping of OC to determine the number of chromosomes using cultivated root, which has active meristems of mitosis suitable for detecting clear chromosomes. Cells were treated with Nitrous Oxide to obtained sufficient cells at mitosis metaphase for staining with DAPI and telomere repetitive sequences (TTTAGGG)6. The basecalling output from PromethION platform was treated using Guppy. Only the reads with mean quality scores >7 were retained and further corrected using the NextDenovo software with parameters "reads_cutoff:2k, seed_cutoff:18k" (https://github.com/Nextomics/NextDenovo). The assembling processes include the correction module using NextCorrect and the assemble module using NextGraph with default parameters. Subsequently, the Nextpolish software was used to polish genome with short-reads four times and long-reads three times (sgs_options = -max_depth 100). The paired-end Hi-C reads were filtered by fastp to remove adapter and low-quality reads (Phred Score > 15, and 5 > number of Ns in the reads). The clean reads and draft genome were analyzed using LACHESIS with parameters "CLUSTER MIN RE SITES = 100；CLUSTER MAX LINK DENSITY=2.5；CLUSTER NONINFORMATIVE RATIO = 1.4". The RepeatMasker was used for repeats annotation following the manual recommended parameters. To aid gene annotation, totally ~75 Gb RNA-sequencing (RNA-Seq) clean pair-ended reads from five tissues, including leaves, roots, twigs, bark, and fruits, were generated using Illumina HiSeq platform. All libraries were de novo assembled separately and subsequently merged using TransABySS manual pipeline. The protein-coding and non-coding gene structural annotation was conducted using MAKER pipeline by incorporating transcriptome mapping, de novo gene predictions, and homology predictions.

创建时间：

2023-06-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集