Clinical Disease Risk Assessment System Based on Multi-source Genetic Information
收藏中国科学数据2026-04-16 更新2026-04-25 收录
下载链接:
https://www.sciengine.com/AA/doi/10.11999/JEIT251025
下载链接
链接失效反馈官方服务:
资源简介:
ObjectiveComplex diseases are driven by polygenic inheritance and gene–environment interactions, resulting in highly heterogeneous pathogenic mechanisms and posing major challenges for both research and public health. Conventional single-trait polygenic risk scores (PRS) aggregate genetic variants associated with individual diseases but are limited by their neglect of cross-trait genetic correlations and nonlinear genetic interactions. Although multi-trait PRS approaches have been proposed to improve prediction accuracy, existing statistical-learning frameworks predominantly rely on linear integration of PRS features, failing to capture nonlinear interactions among single-nucleotide polymorphisms (SNPs) and to fully exploit shared genetic information across diseases. To address these limitations, we propose a nonlinear multi-source disease prediction framework, the SNP–PRS Fusion model, termed the mtSNPPRS_XGB (mtSNP-PRS XGBoost Integration Model).MethodsThe mtSNPPRS_XGB framework integrates raw SNP data of target traits with multi-trait PRS information to enhance genetic risk prediction for complex diseases through nonlinear modeling. SNPs significantly associated with target diseases were extracted from the GWAS Catalog (p –8) and encoded as allele dosages (0/1/2), while PRS weights covering 80 traits were obtained from the PGS Catalog and used to compute individual PRS. After standardized preprocessing, SNP and PRS features were jointly fused and modeled using XGBoost to capture complex SNP-SNP and SNP-PRS interactions. This framework introduces two key innovations: (1) collaborative modeling of multi-trait genetic information by jointly leveraging disease-specific SNPs and cross-disease PRS, and (2) systematic learning of nonlinear genetic interactions to overcome the linear constraints of conventional PRS-based models.Results and Discussions The mtSNPPRS_XGB model was evaluated using UK Biobank data across 18 complex diseases. It achieved an average AUC of 66.70%, representing improvements of 1.04% over the elastic-net-based model and 4.39% over the conventional UniPRS model. The inclusion of SNP features substantially improved predictive performance in diseases such as coronary heart disease, psoriasis, and celiac disease, while the integration of multi-trait PRS further enhanced specificity, particularly in cardiovascular, autoimmune, and cancer-related conditions. SHAP-based interpretability analyses demonstrated that mtSNPPRS_XGB simultaneously captures global cross-disease genetic liability encoded by PRS and disease-specific localized SNP effects, as illustrated in Alzheimer’s disease, colorectal cancer, gout, and ischemic stroke. These findings support both the biological plausibility and interpretability of the proposed framework.ConclusionsWe present a novel statistical learning-based multi-trait genetic risk prediction model, mtSNPPRS_XGB, which introduces an SNP-PRS fusion architecture and employs XGBoost to capture nonlinear interactions among multi-source genetic features. By integrating raw SNP data with multi-trait PRS, the proposed framework significantly improves risk prediction performance for complex diseases. Validation across 18 diseases in the UK Biobank demonstrates consistent performance gains over traditional PRS-based methods. This study overcomes the linear modeling limitations of conventional PRS approaches and provides a new paradigm for nonlinear integration of SNPs and multi-trait PRS, offering a robust and interpretable tool for personalized genetic risk prediction in precision medicine.
创建时间:
2026-04-16



