five

LsRTDv1: A reference transcript dataset for accurate transcript-specific expression analysis in lettuce

收藏
DataONE2024-05-29 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/https://doi.org/10.5061/dryad.xwdbrv1m8
下载链接
链接失效反馈
官方服务:
资源简介:
Accurate quantification of gene and transcript-specific expression, with the underlying knowledge of precise transcript isoforms, is crucial to understanding many biological processes. Analysis of RNA sequencing data has benefited from the development of alignment-free algorithms which enhance the precision and speed of expression analysis. However, such algorithms require a reference transcriptome. Here we present a reference transcript dataset (LsRTDv1) for lettuce, combining long- and short-read sequencing with publicly available transcriptome annotations, and filtering to keep only transcripts with high-confidence splice junctions and transcriptional start and end sites. LsRTDv1 is a valuable resource for the investigation of transcriptional and alternative splicing regulation in lettuce., We generated a lettuce Reference Transcript Dataset (LsRTDv1) by integrating transcript assemblies from short- and long-read RNA sequencing data with existing lettuce genome annotations. RNA sequencing data was generated from 23 different lettuce samples capturing different tissues, ages of plant and treatments. The 23 samples, all from Lactuca sativa cv. Saladin (synonymous with cv. Salinas) were combined equally into 7 samples prior to sequencing. Short-read assembly The RNA-seq reads of the seven pooled samples were pre-processed with Fastp (Chen et al., 2018) to remove adapters and filter low-quality reads (quality score <20, length <30). Trimmed reads were mapped to the latest lettuce reference genome assembly in NCBI (Lsat_Salinas_v11) using STAR aligner in the 2-pass mode to increase the mapping sensitivity at splice junctions (SJs)(Dobin and Gingeras, 2015). Mismatch was set to 1 with minimum and maximum intron sizes of 60 and 15,000 bp respectively. Two transcript assembl..., , # LsRTDv1: A reference transcript dataset for accurate transcript-specific expression analysis in lettuce [https://doi.org/10.5061/dryad.xwdbrv1m8](https://doi.org/10.5061/dryad.xwdbrv1m8) The genome assembly of cultivated lettuce was published in 2017 (Reyes-Chin-Wo et al., 2017) with an updated genome version (version 11) available on NCBI ([https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_002870075.4/](https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_002870075.4/)). Here, we introduce the first lettuce reference transcript dataset (LsRTDv1) integrating long-read Iso-seq and short-read RNA-seq of diverse tissue and treatment samples from lettuce with the GenBank and RefSeq transcript annotations, using stringent quality measures. The final LsRTDv1 includes 179,404 non-redundant transcripts encoded by 65,724 genes, greatly expanding the existing lettuce transcriptome and increasing the number of transcripts per gene from 1.4 to 2.7. LsRTDv1 identifies 3696 novel gene models, predomin...
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作