five

TCGA33Tumors

收藏
DataCite Commons2025-01-10 更新2025-04-15 收录
下载链接:
https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/FZYERP
下载链接
链接失效反馈
官方服务:
资源简介:
This tcga33tumors dataset is a collection of RNA-Seq gene expression values (fpkm normalized) containing 10,078 samples across 33 tumors with 36,017 features. The data were extracted from TCGA repository using TCGAbiolinks package in R under Bioconductor enabled environment. All R codes to reproduce the dataset are included in this dataset. The dataset is partitioned in four different files based on feature types. <br><br> File tcga33main.tar.gz contents: 1. data.csv (10,078 samples x 36,017 features) 2. labels.csv (class labels for 10,078 samples) <br><br> File tcga33tumors.tar.gz contents: 1. data-mRNA.csv (data partition with 17,659 mRNA features) 2. data-miRNA.csv (data partition with 639 miRNA features) 3. data-lncRNA.csv (data partition with 8,961 lncRNA features) 4. data-othRNA.csv (data partition with 8,758 other RNA features) 5. labels.csv (class labels for 10,078 samples) 6. tumors.csv (33 tumors short names presented in same order as in labels) <br><br> This dataset can be used to perform classification and identification of tumors, analysis of gene co-expression network, protein-protein interaction network etc. The dataset is presented as csv format where row headers are 'sample id' and column headers are 'feature id' as per TCGA standard format. <br><br> Please feel free to reach out if you encounter any bugs or have suggestions for improvements.
提供机构:
Harvard Dataverse
创建时间:
2024-09-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作