TCGA33Tumors

Name: TCGA33Tumors
Creator: Harvard Dataverse
Published: 2025-01-10 13:03:48
License: 暂无描述

DataCite Commons2025-01-10 更新2025-04-15 收录

下载链接：

https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/FZYERP

下载链接

链接失效反馈

官方服务：

资源简介：

This tcga33tumors dataset is a collection of RNA-Seq gene expression values (fpkm normalized) containing 10,078 samples across 33 tumors with 36,017 features. The data were extracted from TCGA repository using TCGAbiolinks package in R under Bioconductor enabled environment. All R codes to reproduce the dataset are included in this dataset. The dataset is partitioned in four different files based on feature types. File tcga33main.tar.gz contents: 1. data.csv (10,078 samples x 36,017 features) 2. labels.csv (class labels for 10,078 samples) File tcga33tumors.tar.gz contents: 1. data-mRNA.csv (data partition with 17,659 mRNA features) 2. data-miRNA.csv (data partition with 639 miRNA features) 3. data-lncRNA.csv (data partition with 8,961 lncRNA features) 4. data-othRNA.csv (data partition with 8,758 other RNA features) 5. labels.csv (class labels for 10,078 samples) 6. tumors.csv (33 tumors short names presented in same order as in labels) This dataset can be used to perform classification and identification of tumors, analysis of gene co-expression network, protein-protein interaction network etc. The dataset is presented as csv format where row headers are 'sample id' and column headers are 'feature id' as per TCGA standard format. Please feel free to reach out if you encounter any bugs or have suggestions for improvements.

提供机构：

Harvard Dataverse

创建时间：

2024-09-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集