Differential Gene and Transcript Expression Analysis with TopHat and Cufflinks
收藏NIAID Data Ecosystem2026-03-10 收录
下载链接:
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE32038
下载链接
链接失效反馈官方服务:
资源简介:
This submission includes the sample data for a protocol covering differential expression analysis with TopHat and Cufflinks. The protocol also covers several accessory tools and utilities that aid in managing data, including CummeRbund, a tool for visualizing RNA-Seq analysis results. While the procedure assumes basic informatics skills, these tools assume little to no background with RNA-Seq analysis and are meant for novices and experts alike. The protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results. The example data was generated in silico to closely resemble a real experiment in Drosophila melanogaster. First, expression values in cultured S2 cells were calculated for FlyBase 5.2 transcripts. These values were used to generate 3 sequencing replicates for condition "C1", with underlying variability in expression across replicates simulated by fitting a negative binomial model through the real S2 read count data. A second simulated condition "C2" was generated by perturbing expression for 300 randomly selected genes. Genes were perturbed by selecting the most highly expressed isoform and increasing its relative expression by three fold. Three replicates of this condtion were sequenced as above. Simulated sequencing was performed by picking a transcript from the FlyBase transcriptome with equal to its abundance, choosing a fragment length from a normal distribution with mean = 180bp and standard deviation = 20bp, and then choosing a start point for the fragment within the transcript uniformly at random. Total sequencing yield for each replicate was chosen to match that of the real S2 data. Each replicate was mapped to the fly genome with TopHat v 1.3.1 seperately. The replicates were assembled seperately with Cufflinks v 1.1.0. The replicate assemblies were merged with Cuffmerge. This merged assembly was then analysed for differentially expressed and regulated genes with Cuffdiff.
本提交材料包含一套依托TopHat与Cufflinks开展差异表达分析的流程的示例数据。该流程还涵盖多款辅助数据管理的工具与实用程序,其中包括用于可视化RNA测序(RNA-Seq)分析结果的CummeRbund工具。尽管本流程要求具备基础信息学技能,但上述工具对RNA测序分析背景的要求极低甚至无需相关背景,适用于新手与资深研究者。本流程从原始测序读段开始,最终可产出转录组组装结果、差异表达与调控基因及转录本列表,以及符合出版水准的分析结果可视化图表。
本示例数据为计算机模拟生成,高度拟真黑腹果蝇(Drosophila melanogaster)的真实测序实验。首先,针对FlyBase 5.2版本的转录本,计算培养的S2细胞中的表达量。利用该表达量数据为处理组“C1”生成3组测序重复样本,通过拟合真实S2细胞读段计数数据的负二项分布模型,模拟重复样本间的表达差异。通过随机选取300个基因并扰动其表达量,生成第二组模拟处理组“C2”:扰动方式为选取各基因表达量最高的剪接异构体,将其相对表达量提升三倍。该处理组同样生成3组重复样本,测序流程与前述一致。
模拟测序流程如下:从FlyBase转录组中按丰度比例选取转录本,从均值为180bp、标准差为20bp的正态分布中选取片段长度,再在转录本内随机均匀选取片段起始位点。每组重复样本的总测序产出量均设置为与真实S2细胞测序数据一致。每组重复样本均单独使用TopHat v1.3.1比对至果蝇基因组,随后使用Cufflinks v1.1.0对各组重复样本的比对结果单独进行转录组组装,再通过Cuffmerge合并各组组装结果,最终使用Cuffdiff对合并后的组装结果进行差异表达与调控基因分析。
创建时间:
2019-02-15



