The Cancer Genome Atlas (TCGA) RNA-seq meta-analysis
收藏Figshare2018-02-02 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/The_Cancer_Genome_Atlas_TCGA_RNA-seq_meta-analysis/5851743/1
下载链接
链接失效反馈官方服务:
资源简介:
TCGA RNA-seq V2 Level3 data were downloaded from TCGA Genomic Data Commons Data Portal (https://gdc-portal.nci.nih.gov), consisting of 11,303 samples in 34 cancer projects (33 cancer types). Nine cancer types that do not have corresponding non-tumour samples were filtered out, and the analysis was focused on tumour versus non-tumour comparison. 24 cancer types were used in this meta-analysis: BLCA, BRCA, CESC, CHOL, COAD, ESCA, GBM, HNSC, KICH, KIRC, KIRP, LIHC, LUAD, LUSC, PAAD, PCPG, PRAD, READ, SARC, SKCM, STAD, THCA, THYM, UCEC (https://gdc-portal.nci.nih.gov). The nine filtered cancer types were ACC, DLBC, LAML, LGG, MESO, OV, TGCT, UCS and UVM. <br><br>To extract expression values from TCGA RNA-seq data, we used genomic coordinates to retrieve UCSC Transcript IDs that correspond to the identifiers in TCGA RNA-seq V2 Level3 data (isoform level). The GAF (General Annotation Format) file was used to map the coordinate to UCSC Transcript ID, and it was downloaded form https://tcga-data.nci.nih.gov/docs/GAF/GAF.hg19.June2011.bundle/outputs/TCGA.hg19.June2011.gaf. This file contains genomic annotations shared by all TCGA projects. More details of the GAF file format can be found at https://tcga-data.nci.nih.gov/docs/GAF/GAF3.0/GAF_v3_file_description.docx. We filtered out any coding exons overlapping UCSC Transcript IDs to eliminate expression value of coding genes and evaluate lncRNA expression.<br>We could find the expression values of 443 pcRNAs and 203 tapRNAs in TCGA data, as many of non-coding regions are not yet fully annotated in the TCGA RNA-seq V2 Level3 data. The expression value of pcRNAs and tapRNAs were extracted and clustered by un-supervised Pearson correlation method (Supplementary Figure 18A). The expression values of tapRNA-associated coding genes were also extracted and used to generate the heat-map (Supplementary Figure 18B), which shows the similar pattern of expression with tapRNAs across the cancer types.<br>To show that tapRNAs and associated coding genes have similar expression profiles in cancers we generated a Spearman's Rank-Order Correlation heatmap (Figure 6A) between tapRNAs and their associated coding genes based on the TCGA RNA-seq data. We used the MatLab function corr to calculate the Spearman's rho. This function takes two matrices X (197-by-8,850 expression profiling matrix of tapRNA) and Y (197-by-8,850 expression profiling matrix of tapRNA-assocated coding gene) and returns an 8,850-by-8,850 matrix containing the pairwise correlation coefficient between each pair of 8,850 columns (TCGA cancer samples in Supplementary Figure 18A and B). Thus, the rank-order correlation matrix that we computed from the matrices of expression profiling data (Supplementary Figure S18A and B) allowed us to compare the correlation between two column vectors i.e. cancer samples. This function also returns a matrix of p-values for testing the hypothesis of no correlation against the alternative that there is a nonzero correlation. Each element of a matrix of p-values is the p value for the corresponding element of Spearman's rho. The p-values for Spearman's rho are calculated using large-sample approximations. To check significance level of correlation between tapRNA and its associated coding gene, the diagonal of the p-value matrix was extracted and used. The median is 1.31x10-11 and the mean is 1.03x10-4 with standard deviation 0.0029.<br>To identify cancer-specific tapRNAs, we considered not only the global expression pattern of a given tapRNA in each cancer type, but also expression pattern of specific sub-group that is significantly distinct, to take into account cancer sample heterogeneity. Thus, two conditions were applied: (1) average expression level of a tapRNA in a given cancer type is in top 10% or bottom 10% and (2) a tapRNA has at least 10% of samples in a given cancer type that are significantly up-regulated (Z-score > 2) or down-regulated (Z-score < -2). <br>
提供机构:
Namshik Han
创建时间:
2018-02-02



