Accurate and robust genomic prediction of celiac disease using statistical learning
收藏DataCite Commons2020-09-05 更新2024-07-25 收录
下载链接:
https://figshare.com/articles/dataset/Accurate_and_robust_genomic_prediction_of_celiac_disease_using_statistical_learning/154193/2
下载链接
链接失效反馈官方服务:
资源简介:
Practical application of genomic-‐based risk stratification to clinical diagnosis is appealing yet performance varies widely depending on the disease and genomic risk score (GRS) method. Celiac disease (CD), a common immune-‐mediated illness, is strongly genetically determined and requires specific HLA haplotypes. HLA testing can exclude diagnosis but has low specificity, providing little information suitable for clinical risk stratification. Using six European CD cohorts, we provide a proof-‐ of-‐concept that statistical learning approaches which simultaneously model all SNPs can generate robust and highly accurate predictive models based on genome-‐wide SNP profiles. The high predictive capacity replicated both in cross-‐validation within each cohort (AUC of 0.87—0.89) and in independent replication across cohorts (AUC of 0.86—0.9), despite differences in ethnicity. The models explained 30—35% of disease variance and less than 50% of heritability. The GRS’s utility was assessed in different screening settings. For known family history (10% prevalence), the GRS captured 35% of cases with 1 misdiagnosis per correct diagnosis. In the population-‐wide setting (1% prevalence), the GRS captured 10% of cases with 5 misdiagnoses per correct diagnosis. Comparable to HLA typing, the GRS can identify individuals without CD with >99.6% negative predictive value and, unlike HLA typing, patients can be stratified for further more invasive and costly testing. Despite explaining a minority of disease heritability, our findings indicate a predictive GRS provides clinically relevant information to improve upon current diagnostic pathways for CD, and support further studies evaluating the clinical utility of this approach in CD and other complex diseases.
将基于基因组的风险分层(genomic-based risk stratification)应用于临床诊断的实践路径颇具吸引力,但其性能因疾病类型与基因组风险评分(genomic risk score, GRS)方法的不同而差异显著。
乳糜泻(Celiac disease, CD)是一种常见的免疫介导性疾病,其发病具有强烈的遗传倾向,且依赖特定的人类白细胞抗原(HLA)单倍型。HLA检测可排除乳糜泻诊断,但特异性较低,难以提供适用于临床风险分层的有效信息。
本研究依托6个欧洲乳糜泻队列,完成了概念验证:采用可同时建模所有单核苷酸多态性(Single Nucleotide Polymorphisms, SNPs)的统计学习方法,可基于全基因组单核苷酸多态性图谱构建稳健且高精度的预测模型。
该模型的高预测能力在各队列内部的交叉验证(曲线下面积(Area Under Curve, AUC)为0.87~0.89)以及跨队列的独立验证(AUC为0.86~0.9)中均得到重现,即便不同队列的种族存在差异。
此类模型可解释30%~35%的疾病变异度,但其解释的遗传力不足50%。
研究还评估了GRS在不同筛查场景中的应用价值:针对存在明确家族史(人群患病率10%)的场景,GRS可捕捉35%的病例,此时每正确诊断1例即对应1例误诊;在全人群筛查场景(人群患病率1%)中,GRS可捕捉10%的病例,此时每正确诊断1例即对应5例误诊。
与HLA分型类似,GRS可识别无乳糜泻的个体,其阴性预测值大于99.6%;且与HLA分型不同的是,GRS可对患者进行分层,以便开展后续更具侵入性且成本更高的检测。
尽管本研究构建的GRS仅能解释少数疾病遗传力,但研究结果表明,具有预测能力的GRS可提供临床相关信息,助力优化乳糜泻当前的诊断流程,同时也支持开展更多研究,以评估该方法在乳糜泻及其他复杂疾病中的临床应用价值。
提供机构:
figshare
创建时间:
2016-01-11



