Data Sheet 1_Serum metabolomics-based diagnostic biomarkers for colorectal cancer: insights and multi-omics validation.docx
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/Data_Sheet_1_Serum_metabolomics-based_diagnostic_biomarkers_for_colorectal_cancer_insights_and_multi-omics_validation_docx/30452660
下载链接
链接失效反馈官方服务:
资源简介:
BackgroundColorectal cancer (CRC) remains one of the leading causes of cancer-related mortality worldwide, primarily due to delayed diagnosis. There is an urgent need for sensitive, noninvasive biomarkers that can facilitate early detection and improve clinical outcomes.
MethodsIn this study, we performed untargeted metabolomic profiling of serum samples from 715 participants (248 CRC patients and 467 noncancer controls, NCC) using liquid chromatography-mass spectrometry (LC-MS). Differential metabolites were identified through statistical filtering and multivariate analysis, followed by pathway enrichment to elucidate biologically relevant dysregulations. Subsequently, machine learning algorithms, including Support Vector Machine (SVM), Random Forest (RF), eXtreme Gradient Boosting (XGBoost), and Logistic Regression (LR), were applied to construct predictive models. As a complementary approach, we also profiled cfDNA methylation patterns in a subset of samples and developed a multi-omics classifier integrating metabolite and epigenetic features.
ResultsWe identified 26 CRC-associated serum metabolites, many of which mapped to dysregulated pathways such as primary bile acid biosynthesis and taurine/hypotaurine metabolism, suggesting active reprogramming of host-microbiota metabolic axes in CRC pathogenesis. A metabolomics-based diagnostic model built using ten selected metabolites demonstrated excellent discriminatory performance, achieving area under the receiver operaring characteristic curve (AUROC) of 0.96-0.97 and accuracies up to 92.5% across multiple machine learning methods. Integration of cell-free DNA (cfDNA) methylation markers yielded a multi-omics model with slightly enhanced performance (AUROC=0.98), but the gain over the metabolomics-only model was modest.
ConclusionThis study reveals distinct serum metabolic signatures and pathway disruptions in CRC patients and presents a high-performance, minimally invasive diagnostic model based solely on metabolomics data. While the integration of methylation features offers incremental benefit, metabolomics remains the dominant predictor, underscoring its potential as a standalone platform for early CRC screening and precision medicine.
背景 结直肠癌(Colorectal cancer, CRC)仍是全球范围内癌症相关死亡的主要诱因之一,其核心成因在于诊断延迟。当前亟需高灵敏度、非侵入性的生物标志物,以助力结直肠癌的早期检测并改善临床结局。
方法 本研究采用液相色谱-质谱联用法(liquid chromatography-mass spectrometry, LC-MS),对715名受试者的血清样本开展非靶向代谢组学分析,其中包括248名结直肠癌患者与467名非癌症对照者(noncancer controls, NCC)。通过统计学筛选与多变量分析鉴定差异代谢物,随后进行通路富集分析以阐明具有生物学意义的代谢失调现象。后续采用支持向量机(Support Vector Machine, SVM)、随机森林(Random Forest, RF)、极限梯度提升(eXtreme Gradient Boosting, XGBoost)以及逻辑回归(Logistic Regression, LR)等机器学习算法构建预测模型。作为补充策略,本研究还对部分样本的循环游离DNA(cell-free DNA, cfDNA)甲基化模式进行了分析,并开发了整合代谢物与表观遗传特征的多组学分类器。
结果 本研究共鉴定出26种与结直肠癌相关的血清代谢物,其中多数映射至失调通路,包括初级胆汁酸生物合成以及牛磺酸/亚牛磺酸代谢,这提示结直肠癌发病过程中宿主-菌群代谢轴发生了活跃的重编程。基于筛选出的10种代谢物构建的代谢组学诊断模型表现出优异的区分性能,在多种机器学习算法下,受试者工作特征曲线下面积(area under the receiver operating characteristic curve, AUROC)可达0.96-0.97,准确率最高可达92.5%。整合循环游离DNA甲基化标志物后得到的多组学模型性能略有提升(AUROC=0.98),但相较于单一代谢组学模型的增益较为有限。
结论 本研究揭示了结直肠癌患者存在独特的血清代谢特征与代谢通路失调现象,并提出了一种仅基于代谢组学数据的高性能、微创诊断模型。尽管整合甲基化特征可带来小幅性能增益,但代谢组学仍是主要的预测因子,这凸显了其作为结直肠癌早期筛查与精准医疗独立平台的潜力。
创建时间:
2025-10-27



