five

Prediction of Complex Human Traits Using the Genomic Best Linear Unbiased Predictor

收藏
NIAID Data Ecosystem2026-03-07 收录
下载链接:
https://figshare.com/articles/dataset/_Prediction_of_Complex_Human_Traits_Using_the_Genomic_Best_Linear_Unbiased_Predictor_/744789
下载链接
链接失效反馈
官方服务:
资源简介:
Despite important advances from Genome Wide Association Studies (GWAS), for most complex human traits and diseases, a sizable proportion of genetic variance remains unexplained and prediction accuracy (PA) is usually low. Evidence suggests that PA can be improved using Whole-Genome Regression (WGR) models where phenotypes are regressed on hundreds of thousands of variants simultaneously. The Genomic Best Linear Unbiased Prediction (G-BLUP, a ridge-regression type method) is a commonly used WGR method and has shown good predictive performance when applied to plant and animal breeding populations. However, breeding and human populations differ greatly in a number of factors that can affect the predictive performance of G-BLUP. Using theory, simulations, and real data analysis, we study the performance of G-BLUP when applied to data from related and unrelated human subjects. Under perfect linkage disequilibrium (LD) between markers and QTL, the prediction R-squared (R2) of G-BLUP reaches trait-heritability, asymptotically. However, under imperfect LD between markers and QTL, prediction R2 based on G-BLUP has a much lower upper bound. We show that the minimum decrease in prediction accuracy caused by imperfect LD between markers and QTL is given by (1−b)2, where b is the regression of marker-derived genomic relationships on those realized at causal loci. For pairs of related individuals, due to within-family disequilibrium, the patterns of realized genomic similarity are similar across the genome; therefore b is close to one inducing small decrease in R2. However, with distantly related individuals b reaches very low values imposing a very low upper bound on prediction R2. Our simulations suggest that for the analysis of data from unrelated individuals, the asymptotic upper bound on R2 may be of the order of 20% of the trait heritability. We show how PA can be enhanced with use of variable selection or differential shrinkage of estimates of marker effects.

尽管基因组全关联研究(Genome Wide Association Studies, GWAS)已取得重要进展,但针对绝大多数人类复杂性状与疾病而言,仍有相当比例的遗传变异未得到解释,且预测准确率(prediction accuracy, PA)普遍偏低。已有研究表明,采用全基因组回归(Whole-Genome Regression, WGR)模型可提升预测准确率,这类模型可同时对数十万个遗传变异开展表型回归分析。基因组最佳线性无偏预测(Genomic Best Linear Unbiased Prediction, G-BLUP,一种岭回归类方法)是当前常用的WGR方法,在动植物育种群体中已展现出优异的预测性能。然而,育种群体与人类群体在诸多影响G-BLUP预测性能的因素上存在显著差异。本研究通过理论推导、模拟实验与真实数据分析,探究G-BLUP应用于有亲缘关系及无亲缘关系人类受试者数据时的预测性能。在标记与数量性状基因座(Quantitative Trait Locus, QTL)间存在完美连锁不平衡(linkage disequilibrium, LD)的假设下,G-BLUP的预测决定系数(prediction R-squared, R²)可渐近达到性状遗传力水平。但当标记与QTL间连锁不平衡不完善时,基于G-BLUP的预测R²上限将大幅降低。本研究证明,由标记推导的基因组亲缘关系对因果位点处实际亲缘关系的回归系数为b时,不完善连锁不平衡所导致的预测准确率最低降幅为(1−b)²。对于有亲缘关系的个体对而言,由于家系内连锁不平衡的存在,全基因组范围内的实际基因组相似性模式较为一致,因此b值接近1,预测R²的降幅较小。但对于远缘亲缘关系的个体对,b值会极低,从而对预测R²施加极低的上限。本研究的模拟实验结果显示,针对无亲缘关系个体的数据分析,预测R²的渐近上限约为性状遗传力的20%。本研究还证明,通过变量选择或对标记效应估计值进行差异化收缩,可进一步提升预测准确率。
创建时间:
2013-07-11
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作