five

Effect of method of deduplication on estimation of differential gene expression using RNA-seq. Homo sapiens

收藏
NIAID Data Ecosystem2026-03-09 收录
下载链接:
https://www.ncbi.nlm.nih.gov/bioproject/PRJNA354977
下载链接
链接失效反馈
官方服务:
资源简介:
RNA-seq is a useful tool for analysis of gene expression. However its robustness is greatly affected by different artifacts. One of such artifacts is the presence of duplicated reads.To infer the influence of different methods of removal of duplicated reads on estimation of gene expression for cancer genomics we analyzed samples of normal liver tissue and hepatocellular carcinoma. For each sample, four protocols of data analysis were applied: processing without deduplication, deduplication with method implemented in samtools, and deduplication based on one or two unique molecular indexes (UMI). We also analyzed the influence of sequencing layout (single read or paired end) and read length. We found that deduplication without UMI greatly alters estimated expression values; this effect is the most pronounced for highly expressed genes.The use of unique molecular identifiers greatly improves accuracy of RNA-seq analysis, especially for highly expressed genes. We developed a set of scripts that enable handling of UMI and their incorporation into RNA-seq analysis pipelines. Deduplication without UMI alters results of differential gene expression analysis, creating high fraction of false negative results. The absence of duplicate read removal is biased towards false positives. In cases where the use UMI is not possible, we recommend to use paired-end sequencing layout.
创建时间:
2016-11-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作