Development and performance of female breast cancer incidence risk prediction models: a systematic review and meta-analysis

Name: Development and performance of female breast cancer incidence risk prediction models: a systematic review and meta-analysis
Creator: Kao, Chunyu; Wang, Yongjiu; Wang, Fei; Wang, Di; Yu, Zhigang; Yu, Lixiang; Zhang, Ziyu; Zhou, Peng; Hou, Lijuan; Liu, Liyuan
Published: 2025-12-21 00:00:00
License: 暂无描述

Taylor & Francis Group2025-12-21 更新2026-04-16 收录

下载链接：

https://tandf.figshare.com/articles/dataset/Development_and_performance_of_female_breast_cancer_incidence_risk_prediction_models_a_systematic_review_and_meta-analysis/29606095/1

下载链接

链接失效反馈

官方服务：

资源简介：

Accurate breast cancer risk prediction is essential for early detection and personalized prevention strategies. While traditional models, such as Gail and Tyrer–Cuzick, are widely utilized, machine learning-based approaches may offer enhanced predictive performance. This systematic review and meta-analysis compare the accuracy of traditional statistical models and machine learning models in breast cancer risk prediction. A total of 144 studies from 27 countries were systematically reviewed, incorporating genetic, clinical, and imaging data. Pooled C-statistics were calculated to assess model discrimination, while observed-to-expected (O/E) ratios were used to evaluate calibration. Subgroup and sensitivity analyses were conducted to examine heterogeneity and assess the influence of study bias across various populations. Machine learning-based models demonstrated superior performance, with a pooled C-statistic of 0.74, compared to 0.67 for traditional models. Models that integrated genetic and imaging data showed the highest levels of accuracy, although performance varied by population. Sensitivity analyses excluding high-bias studies showed improved discrimination in models incorporating genetic factors, with the pooled C-statistic increasing to 0.72. Traditional models, such as Gail, exhibited notably poor predictive accuracy in non-Western populations, as evidenced by a C-statistic of 0.543 in Chinese cohorts. Machine learning models provide significantly greater predictive accuracy for breast cancer risk, particularly when incorporating multidimensional data. However, issues related to model generalizability and interpretability remain, particularly in diverse populations. Future research should focus on developing more interpretable models and expanding global validation efforts to improve model applicability across different demographic groups.

精准的乳腺癌风险预测对于早期筛查与个性化预防策略至关重要。尽管传统模型（如Gail模型与Tyrer–Cuzick模型）已被广泛应用，但基于机器学习的方法或可实现更优异的预测性能。本系统综述与荟萃分析对比了传统统计模型与机器学习模型在乳腺癌风险预测中的准确性。研究共系统梳理了来自27个国家的144项相关研究，纳入了遗传、临床与影像数据。研究通过计算合并C统计量（pooled C-statistic）以评估模型的区分度，并采用观测-期望比（observed-to-expected (O/E) ratios）评价模型校准性能。此外还开展了亚组分析与敏感性分析，以检验异质性并评估不同人群中研究偏倚的影响。结果显示，基于机器学习的模型表现更优，其合并C统计量为0.74，而传统模型的合并C统计量仅为0.67。整合遗传与影像数据的模型预测精度最高，不过模型性能仍存在人群异质性。排除高偏倚研究的敏感性分析结果表明，纳入遗传因素的模型区分度有所提升，合并C统计量升至0.72。传统模型（如Gail模型）在非西方人群中预测精度显著较差，中国队列中的C统计量仅为0.543。机器学习模型在乳腺癌风险预测中展现出显著更优的精度，尤其当整合多维度数据时。但目前仍存在模型泛化性与可解释性相关的问题，在多样化人群中尤为突出。未来研究应聚焦于开发更具可解释性的模型，并扩大全球验证范围，以提升模型在不同人口学群体中的适用性。

提供机构：

Kao, Chunyu; Wang, Yongjiu; Wang, Fei; Wang, Di; Yu, Zhigang; Yu, Lixiang; Zhang, Ziyu; Zhou, Peng; Hou, Lijuan; Liu, Liyuan

创建时间：

2025-07-20