five

TCGA33Tumors

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://doi.org/10.7910/DVN/FZYERP
下载链接
链接失效反馈
官方服务:
资源简介:
This tcga33tumors dataset is a collection of RNA-Seq gene expression values (fpkm normalized) containing 10,078 samples across 33 tumors with 36,017 features. The data were extracted from TCGA repository using TCGAbiolinks package in R under Bioconductor enabled environment. All R codes to reproduce the dataset are included in this dataset. The dataset is partitioned in four different files based on feature types. File tcga33main.tar.gz contents: 1. data.csv (10,078 samples x 36,017 features) 2. labels.csv (class labels for 10,078 samples) File tcga33tumors.tar.gz contents: 1. data-mRNA.csv (data partition with 17,659 mRNA features) 2. data-miRNA.csv (data partition with 639 miRNA features) 3. data-lncRNA.csv (data partition with 8,961 lncRNA features) 4. data-othRNA.csv (data partition with 8,758 other RNA features) 5. labels.csv (class labels for 10,078 samples) 6. tumors.csv (33 tumors short names presented in same order as in labels) This dataset can be used to perform classification and identification of tumors, analysis of gene co-expression network, protein-protein interaction network etc. The dataset is presented as csv format where row headers are 'sample id' and column headers are 'feature id' as per TCGA standard format. Please feel free to reach out if you encounter any bugs or have suggestions for improvements.
创建时间:
2025-01-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作