five

Multi-omic and survival datasets used for "DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data"

收藏
Figshare2021-06-24 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/Multi-omic_and_survival_datasets_used_for_DeepProg_an_ensemble_of_deep-learning_and_machine-learning_models_for_prognosis_prediction_using_multi-omics_data_/14832813/1
下载链接
链接失效反馈
官方服务:
资源简介:
We obtained the 32 cancer multi-omic datasets from NCBI using TCGA portal (https://tcgadata.nci.nih.gov/tcga/). We used the package TCGA-Assembler (versions 2.0.5) and wrote custom scripts to download RNA-Seq (UNC IlluminaHiSeq RNASeqV2), miRNA Sequencing (BCGSC IlluminaHiSeq, Level 3), and DNA methylation (JHU-USC HumanMethylation450) data from the TCGA website on November 4-14th, 2017. We also obtained the survival information from the portal: https://portal.gdc.cancer.gov/. We used the same preprocessing steps as detailed in our previous study. We first downloaded RNA-Seq, miRNA-Seq and methylation data using the functions DownloadRNASeqData, DownloadmiRNASeqData, and DownloadMethylationData from TCGAAssembler, respectively. Then, we processed the data with the functions ProcessRNASeqData, ProcessmiRNASeqData, and ProcessMethylation450Data. In addition, we processed the methylation data with the function CalculateSingleValueMethylationData. Finally, for each omic data type, we created a gene-by-sample data matrix in the Tabular Separated Value (TSV) format using a custom script. <br>

本研究通过TCGA门户(https://tcgadata.nci.nih.gov/tcga/)从美国国家生物技术信息中心(NCBI)获取了32种癌症多组学数据集。2017年11月4日至14日,本研究借助TCGA-Assembler工具(版本2.0.5)并编写自定义脚本,从TCGA官网下载了RNA测序(RNA-Seq,UNC IlluminaHiSeq RNASeqV2)、microRNA测序(miRNA Sequencing,BCGSC IlluminaHiSeq,Level 3)以及DNA甲基化(JHU-USC HumanMethylation450)数据。此外,本研究从GDC门户(https://portal.gdc.cancer.gov/)获取了生存信息。本研究采用了与此前研究一致的预处理流程。本研究首先通过TCGA-Assembler工具中的DownloadRNASeqData、DownloadmiRNASeqData以及DownloadMethylationData函数,分别下载了RNA测序(RNA-Seq)、microRNA测序(miRNA-Seq)与甲基化数据。随后,本研究分别使用ProcessRNASeqData、ProcessmiRNASeqData以及ProcessMethylation450Data函数对上述数据进行处理。此外,本研究通过CalculateSingleValueMethylationData函数对甲基化数据进行了额外处理。最终,本研究通过自定义脚本为每种组学数据类型生成了以基因为行、样本为列的制表符分隔值(Tabular Separated Value,TSV)格式基因-样本数据矩阵。
提供机构:
Poirion, Olivier; Deep, Kumard; Jing, Zheng; Garmire, Lana; Huang, Sijia
创建时间:
2021-06-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作