TCGA33Tumors
收藏DataCite Commons2025-01-10 更新2025-04-15 收录
下载链接:
https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/FZYERP
下载链接
链接失效反馈官方服务:
资源简介:
This tcga33tumors dataset is a collection of RNA-Seq gene expression values (fpkm normalized) containing 10,078 samples across 33 tumors with 36,017 features. The data were extracted from TCGA repository using TCGAbiolinks package in R under Bioconductor enabled environment. All R codes to reproduce the dataset are included in this dataset. The dataset is partitioned in four different files based on feature types.
<br><br>
File tcga33main.tar.gz contents:
1. data.csv (10,078 samples x 36,017 features)
2. labels.csv (class labels for 10,078 samples)
<br><br>
File tcga33tumors.tar.gz contents:
1. data-mRNA.csv (data partition with 17,659 mRNA features)
2. data-miRNA.csv (data partition with 639 miRNA features)
3. data-lncRNA.csv (data partition with 8,961 lncRNA features)
4. data-othRNA.csv (data partition with 8,758 other RNA features)
5. labels.csv (class labels for 10,078 samples)
6. tumors.csv (33 tumors short names presented in same order as in labels)
<br><br>
This dataset can be used to perform classification and identification of tumors, analysis of gene co-expression network, protein-protein interaction network etc. The dataset is presented as csv format where row headers are 'sample id' and column headers are 'feature id' as per TCGA standard format.
<br><br>
Please feel free to reach out if you encounter any bugs or have suggestions for improvements.
提供机构:
Harvard Dataverse
创建时间:
2024-09-28



