Multi-omics pan-cancer TCGA data
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/sophiep96/multitask_impute
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了来自33种癌症类型的患者的RNAseq、甲基化和miRNA数据,用于各种预测任务。研究采用了深度填充技术来处理缺失数据,这使得能够纳入数据不完整的患者,并保留了最具信息性的特征,以提高模型的性能。经过填充后,数据集总共包含了9,671名患者。该数据集的任务包括癌症类型预测、年龄预测以及生存预测的分类和回归任务。
This dataset contains RNAseq, methylation, and miRNA data from patients across 33 cancer types, designed for a range of prediction tasks. Deep imputation techniques were utilized to address missing data in this study, enabling the inclusion of patients with incomplete datasets while preserving the most informative features to improve model performance. After imputation, the dataset includes a total of 9,671 patients. The supported tasks include classification and regression for cancer type prediction, age prediction, and survival prediction.
提供机构:
The Cancer Genome Atlas (TCGA)



