Optimization of De Novo Short Read Assembly of Seabuckthorn (Hippophae rhamnoides L.) Transcriptome

Figshare2016-10-31 更新2026-04-29 收录

下载链接：

https://figshare.com/articles/dataset/Optimization_of_De_Novo_Short_Read_Assembly_of_Seabuckthorn_Hippophae_rhamnoides_L_Transcriptome/779002

下载链接

链接失效反馈

官方服务：

资源简介：

Seabuckthorn (Hippophaerhamnoides L.) is known for its medicinal, nutritional and environmental importance since ancient times. However, very limited efforts have been made to characterize the genome and transcriptome of this wonder plant. Here, we report the use of next generation massive parallel sequencing technology (Illumina platform) and de novo assembly to gain a comprehensive view of the seabuckthorn transcriptome. We assembled 86,253,874 high quality short reads using six assembly tools. At our hand, assembly of non-redundant short reads following a two-step procedure was found to be the best considering various assembly quality parameters. Initially, ABySS tool was used following an additive k-mer approach. The assembled transcripts were subsequently subjected to TGICL suite. Finally, de novo short read assembly yielded 88,297 transcripts (> 100 bp), representing about 53 Mb of seabuckthorn transcriptome. The average length of transcripts was 610 bp, N50 length 1198 BP and 91% of the short reads uniquely mapped back to seabuckthorn transcriptome. A total of 41,340 (46.8%) transcripts showed significant similarity with sequences present in nr protein databases of NCBI (E-value 1E-06). We also screened the assembled transcripts for the presence of transcription factors and simple sequence repeats. Our strategy involving the use of short read assembler (ABySS) followed by TGICL will be useful for the researchers working with a non-model organism’s transcriptome in terms of saving time and reducing complexity in data management. The seabuckthorn transcriptome data generated here provide a valuable resource for gene discovery and development of functional molecular markers.

沙棘（Hippophae rhamnoides L.）自古以来便以其药用、营养与生态价值闻名。然而，目前针对这一神奇植物的基因组与转录组解析研究仍十分有限。本研究采用新一代大规模并行测序技术（Illumina平台）结合从头组装（de novo assembly）策略，以全面解析沙棘的转录组。我们使用6种组装工具，共组装得到86,253,874条高质量短读长序列。经测试，结合多种组装质量参数来看，采用两步流程进行去冗余短读长序列组装的策略效果最优：首先采用加性k-mer策略运行ABySS工具，随后将组装得到的转录本提交至TGICL套件进行后续处理。最终，本次从头短读长组装共得到88,297条转录本（长度大于100 bp），覆盖沙棘转录组约53 Mb的序列范围；转录本的平均长度为610 bp，N50长度为1198 bp，且91%的短读长序列可唯一比对至沙棘转录组。共有41,340条转录本（占比46.8%）与美国国家生物技术信息中心（NCBI）非冗余蛋白质数据库（nr）中的序列存在显著相似性（E-value=1E-06）。本研究同时对组装得到的转录本进行了转录因子与简单序列重复（simple sequence repeats, SSR）位点筛查。本次采用的先使用短读长组装工具ABySS、再结合TGICL套件的研究策略，可为非模式生物转录组研究人员节省时间并降低数据管理复杂度。本研究获得的沙棘转录组数据，可为基因挖掘与功能型分子标记开发提供宝贵的资源。

创建时间：

2016-10-31

5,000+

优质数据集

54 个

任务类型

进入经典数据集