MLOmics: Cancer Multi-Omics Database for Machine Learning
收藏DataCite Commons2025-06-01 更新2025-05-07 收录
下载链接:
https://figshare.com/articles/dataset/MLOmics_Cancer_Multi-Omics_Database_for_Machine_Learning/28729127/1
下载链接
链接失效反馈官方服务:
资源简介:
Framing the investigation of diverse cancers as a machine learning problem has recently shown significant potential in multi-omics analysis and cancer research. Empowering these successful machine learning models are the high-quality training datasets with sufficient data volume and adequate preprocessing. However, while there exist several public data portals including The Cancer Genome Atlas (TCGA) multi-omics initiative or open-bases such as the LinkedOmics, these databases are not off-the-shelf for existing machine learning models. we propose MLOmics, an open cancer multi-omics database aiming at serving better the development and evaluation of bioinformatics and machine learning models. MLOmics contains 8,314 patient samples covering all 32 cancer types with four omics types, stratified features, and extensive baselines. Complementary support for downstream analysis and bio-knowledge linking are also included to support interdisciplinary analysis.
将多种癌症的研究框架化为机器学习问题,近年来在多组学(multi-omics)分析与癌症研究领域展现出了显著潜力。支撑这些高性能机器学习模型的,是具备充足数据规模与完善预处理流程的高质量训练数据集。然而,尽管目前已有包括癌症基因组图谱(The Cancer Genome Atlas, TCGA)多组学计划在内的多个公开数据门户,以及如LinkedOmics这类开放数据库资源,但此类数据库并不能直接适配现有机器学习模型的使用需求,无法即插即用。为此,我们提出MLOmics这一开放的癌症多组学数据库,旨在更好地服务于生物信息学与机器学习模型的开发与评估工作。MLOmics包含8314份患者样本,覆盖全部32种癌症类型,涵盖四类组学数据、分层特征以及丰富的基准实验结果。此外,该数据库还提供了下游分析与生物知识关联的配套支持,以助力跨学科研究分析。
提供机构:
figshare
创建时间:
2025-04-20
搜集汇总
数据集介绍

背景与挑战
背景概述
MLOmics是一个开放的癌症多组学数据库,旨在更好地支持生物信息学和机器学习模型的开发与评估。该数据库包含8,314个患者样本,覆盖32种癌症类型和四种组学数据类型,并提供分层特征和广泛的基线数据,同时支持下游分析和生物知识链接。
以上内容由遇见数据集搜集并总结生成



