A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequence Quality Control consortium. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequence Quality Control consortium
收藏NIAID Data Ecosystem2026-03-07 收录
下载链接:
https://www.ncbi.nlm.nih.gov/bioproject/PRJNA208369
下载链接
链接失效反馈官方服务:
资源简介:
We present primary results from the Sequencing Quality Control (SEQC) project, coordinated by the United States Food and Drug Administration. Examining Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites using reference RNA samples with built-in controls, we assess RNA sequencing (RNA-seq) performance for sequence discovery and differential expression profiling and compare it to microarray and quantitative PCR (qPCR) data using complementary metrics. At all sequencing depths, we discover unannotated exon-exon junctions, with >80% validated by qPCR. We find that measurements of relative expression are accurate and reproducible across sites and platforms if specific filters are used. In contrast, RNA-seq and microarrays do not provide accurate absolute measurements, and gene-specific biases are observed, for these and qPCR. Measurement performance depends on the platform and data analysis pipeline, and variation is large for transcriptlevel profiling. The complete SEQC data sets, comprising >100 billion reads (10Tb), provide unique resources for evaluating RNA-seq analyses for clinical and regulatory settings. Overall design: The well-characterized reference RNA samples A (pooled cell lines) and B (human brain) from the MAQC consortium, adding spike-ins of synthetic RNA from the External RNA Control Consortium (ERCC). Samples C and D were then constructed by combining A and B in known mixing ratios, 3:1 and 1:3, respectively. All samples were distributed to several independent sites for RNA-Seq library construction and profiling by Illumina HiSeq 2000 and LifeTech SOLiD 5500 platforms. Also, vendors created their own cDNA libraries that were then distributed to each test site, in order to examine the degree of a ?site effect? that was independent of the library preparation process. To support an assessment of gene models, samples A and B were also sequenced at independent sites by the Roche 454 platform, providing longer reads. For comparison to other technologies, these data were also compared to the MAQC-I Affymetrix U133 Plus2 microarray, several current microarray platforms, and also assessed by 20,801 PrimePCR reactions. Sample A: Universal Human Reference RNA (UHRR) from Stratagene and ERCC Spike-In controls Sample B: Human Brain Reference RNA (HBRR) from Ambion and ERCC Spike-In controls Sample C: Mix of A and B (3:1) Sample D: Mix of A and B (1:3) Sample E: Ambion ERCC Spike-In Control Mix 1 Sample F: Ambion ERCC Spike-In Control Mix 2
本研究报道了由美国食品药品监督管理局(United States Food and Drug Administration, FDA)统筹的测序质量控制(Sequencing Quality Control, SEQC)项目的初步研究结果。本研究采用内置质控的参考RNA样本,在多个实验室站点对Illumina HiSeq、Life Technologies SOLiD及Roche 454测序平台进行检测,评估RNA测序(RNA-seq)在序列发现与差异表达谱分析中的性能,并通过互补性指标将其与微阵列及定量聚合酶链反应(quantitative PCR, qPCR)数据进行对比。在所有测序深度下,本研究均发现了未注释的外显子-外显子连接位点,其中超过80%经qPCR验证。研究发现,若采用特定过滤策略,相对表达量的测量结果在不同实验室站点与测序平台间均具备准确性与可重复性。与之相反,RNA-seq与微阵列无法提供准确的绝对表达量测量结果,且本研究在这些技术及qPCR中均观察到了基因特异性偏差。测量性能取决于测序平台与数据分析流程,且转录本水平的表达谱分析变异度较大。完整的SEQC数据集包含超过1000亿条读段(10Tb),为临床与监管场景下的RNA-seq分析评估提供了独特的研究资源。
实验设计概述:
本研究采用了微阵列质量控制(Microarray Quality Control, MAQC)联盟已充分表征的参考RNA样本A(混合细胞系)与样本B(人脑组织),并加入了外部RNA质控联盟(External RNA Control Consortium, ERCC)合成的外源内参RNA。随后按照已知混合比例分别将A与B以3:1和1:3的比例混合,构建得到样本C与样本D。将所有样本分发至多个独立实验室,进行RNA-seq文库构建,并采用Illumina HiSeq 2000与LifeTech SOLiD 5500平台进行测序分析。此外,测序服务商自行构建cDNA文库后分发至各测试站点,以评估不依赖于文库制备流程的“站点效应”程度。为支持基因模型的评估,研究团队还在独立实验室采用Roche 454平台对样本A与样本B进行了测序,以获取更长的读段。为与其他技术进行对比,本研究还将这些数据与MAQC-I的Affymetrix U133 Plus2微阵列、多款当前主流微阵列平台的数据进行了比对,并通过20801个PrimePCR反应完成了评估。
样本A:来自Stratagene的通用人类参考RNA(Universal Human Reference RNA, UHRR)及ERCC内参对照
样本B:来自Ambion的人脑参考RNA(Human Brain Reference RNA, HBRR)及ERCC内参对照
样本C:A与B以3:1比例混合的样本
样本D:A与B以1:3比例混合的样本
样本E:Ambion ERCC内参对照混合液1
样本F:Ambion ERCC内参对照混合液2
创建时间:
2013-06-10



