Supporting data for "Comprehensive evaluation of RNA-Seq analysis pipelines in diploid and polyploid species"
收藏DataCite Commons2025-05-26 更新2025-04-15 收录
下载链接:
http://gigadb.org/dataset/100517
下载链接
链接失效反馈官方服务:
资源简介:
The usual analysis of RNA-Seq reads is based on an existing reference genome and annotated gene models. However, when a reference for the sequenced species is not available, alternatives include using a reference genome from a related species or reconstructing transcript sequences with de novo assembly. In addition, researchers are faced with many options for RNA-Seq data processing and limited information on how their decisions will impact the final outcome. Using both a diploid and polyploid species with a distant reference genome, we have tested the influence of different tools at various steps of a typical RNA-Seq analysis workflow on the recovery of useful processed data available for downstream analysis. At the preprocessing step, we found error correction has a strong influence on de novo assembly but not on mapping results. After trimming, a greater percentage of reads were able to be used in downstream analysis by selecting gentle quality trimming performed with Skewer instead of strict quality trimming with Trimmomatic. This availability of reads correlated with size, quality and completeness of de novo assemblies, and number of mapped reads. When selecting a reference genome from a related species to map reads, outcome was significantly improved when using mapping software tolerant of greater sequence divergence, such as Stampy or GSNAP. The selection of bioinformatic software tools for RNA-Seq data analysis can maximize quality parameters on de novo assemblies and availability of reads in downstream analysis.
提供机构:
GigaScience Database
创建时间:
2018-10-18



