TCGA33Tumors
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://doi.org/10.7910/DVN/FZYERP
下载链接
链接失效反馈官方服务:
资源简介:
This tcga33tumors dataset is a collection of RNA-Seq gene expression values (fpkm normalized) containing 10,078 samples across 33 tumors with 36,017 features. The data were extracted from TCGA repository using TCGAbiolinks package in R under Bioconductor enabled environment. All R codes to reproduce the dataset are included in this dataset. The dataset is partitioned in four different files based on feature types. File tcga33main.tar.gz contents: 1. data.csv (10,078 samples x 36,017 features) 2. labels.csv (class labels for 10,078 samples) File tcga33tumors.tar.gz contents: 1. data-mRNA.csv (data partition with 17,659 mRNA features) 2. data-miRNA.csv (data partition with 639 miRNA features) 3. data-lncRNA.csv (data partition with 8,961 lncRNA features) 4. data-othRNA.csv (data partition with 8,758 other RNA features) 5. labels.csv (class labels for 10,078 samples) 6. tumors.csv (33 tumors short names presented in same order as in labels) This dataset can be used to perform classification and identification of tumors, analysis of gene co-expression network, protein-protein interaction network etc. The dataset is presented as csv format where row headers are 'sample id' and column headers are 'feature id' as per TCGA standard format. Please feel free to reach out if you encounter any bugs or have suggestions for improvements.
创建时间:
2025-01-10



