Clinical prediction model for recurrent rectal cancer

Name: Clinical prediction model for recurrent rectal cancer
Creator: Thammasat University
Published: 2026-05-01 01:40:21
License: 暂无描述

DataCite Commons2026-05-01 更新2026-05-04 收录

下载链接：

http://doi.nrct.go.th/?page=resolve_doi&resolve_doi=10.14457/TU.the.2025.291

下载链接

链接失效反馈

官方服务：

资源简介：

Background: Recurrence after curative-intent treatment for rectal cancer remains a major clinical challenge, and accurate prediction is critical for optimizing sur-veillance and improving outcomes. While conventional regression models are widely used, machine learning (ML) algorithms may offer superior predictive performance by capturing complex, non-linear relationships among clinical variables.Objective: The primary objective was to develop and compare prediction models for recurrent rectal cancer using conventional regression and various ML algo-rithms to identify the most effective approach in terms of predictive accuracy. The sec-ondary objective was to evaluate whether ML models provide greater predictive utility than regression-based models in a clinical decision-making context.Methods: A retrospective cohort study was conducted using data from 581 patients with rectal adenocarcinoma who underwent curative-intent treatment between 2013 and 2022 at two tertiary centers in Thailand. Seventeen clinical, pathological, and treatment-related features were selected for model development. Five algorithms—Logistic Regression, Random Forest, Support Vector Machine (RBF kernel), XGBoost, and LightGBM—were trained and evaluated using stratified train–test splits (80:20), five-fold cross-validation, and multiple performance metrics, including area under the ROC curve (AUC), average precision (AP), and F1 score.Results: All models achieved moderate discriminatory performance (AUC range: 0.626–0.662; AP range: 0.483–0.584). Random Forest demonstrated the highest AP score (0.584), indicating superior precision–recall balance for imbalanced data. XGBoost achieved the highest AUC (0.662), while Logistic Regression attained perfect sensitivity at the F1-optimised threshold, ensuring no missed recurrence cases. Across all models, key predictors included pre-treatment carcinoembryonic antigen (CEA) lev-el, nodal status, surgical margin status, tumor stage, and lymphovascular invasion.Conclusions: Among the five models evaluated, Random Forest offered the strongest balance between sensitivity and precision, fulfilling the primary objective of identifying the best-performing model. ML algorithms, particularly tree-based en-sembles, demonstrated measurable advantages over conventional logistic regression in predicting recurrence, supporting their integration into clinical decision-support sys-tems. External validation and incorporation of additional molecular and imaging varia-bles are recommended to enhance model generalizability and performance.

提供机构：

Thammasat University

创建时间：

2026-05-01

5,000+

优质数据集

54 个

任务类型

进入经典数据集