Improving the coverage of credible sets in Bayesian genetic fine-mapping
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://figshare.com/articles/dataset/Improving_the_coverage_of_credible_sets_in_Bayesian_genetic_fine-mapping/12120570
下载链接
链接失效反馈官方服务:
资源简介:
Genome Wide Association Studies (GWAS) have successfully identified thousands of loci associated with human diseases. Bayesian genetic fine-mapping studies aim to identify the specific causal variants within GWAS loci responsible for each association, reporting credible sets of plausible causal variants, which are interpreted as containing the causal variant with some “coverage probability”. Here, we use simulations to demonstrate that the coverage probabilities are over-conservative in most fine-mapping situations. We show that this is because fine-mapping data sets are not randomly selected from amongst all causal variants, but from amongst causal variants with larger effect sizes. We present a method to re-estimate the coverage of credible sets using rapid simulations based on the observed, or estimated, SNP correlation structure, we call this the “adjusted coverage estimate”. This is extended to find “adjusted credible sets”, which are the smallest set of variants such that their adjusted coverage estimate meets the target coverage. We use our method to improve the resolution of a fine-mapping study of type 1 diabetes. We found that in 27 out of 39 associated genomic regions our method could reduce the number of potentially causal variants to consider for follow-up, and found that none of the 95% or 99% credible sets required the inclusion of more variants—a pattern matched in simulations of well powered GWAS. Crucially, our method requires only GWAS summary statistics and remains accurate when SNP correlations are estimated from a large reference panel. Using our method to improve the resolution of fine-mapping studies will enable more efficient expenditure of resources in the follow-up process of annotating the variants in the credible set to determine the implicated genes and pathways in human diseases.
全基因组关联研究(Genome Wide Association Studies, GWAS)已成功鉴定出数千个与人类疾病相关的基因组位点。贝叶斯遗传精细定位研究旨在挖掘GWAS位点内的特异性致病变异,以解析每一个疾病关联信号,此类研究通常会输出由潜在致病变异构成的可信集,并约定该集合包含真实致病变异的概率为“覆盖概率”。本研究通过模拟实验证实,在多数精细定位场景中,上述覆盖概率存在过度保守的偏差。我们进一步揭示,该偏差的根源在于精细定位所用的数据集并非从所有致病变异中随机抽样得到,而是仅选取了效应量更大的致病变异。我们提出一种基于观测或估计得到的单核苷酸多态性(Single Nucleotide Polymorphism, SNP)相关结构,通过快速模拟重新估计可信集覆盖度的方法,将其命名为“校正覆盖度估计值”。以此为基础,我们进一步拓展得到“校正后可信集”——即满足校正覆盖度估计值达到目标覆盖度要求的最小变异集合。我们将该方法应用于1型糖尿病的精细定位研究以提升其解析度,分析结果显示:在39个关联基因组区域中,有27个区域可通过本方法减少后续需优先排查的潜在致病变异数量;且未出现95%或99%可信集需要纳入更多变异的情况,这一规律在功效充足的GWAS模拟实验中同样得到验证。至关重要的是,本方法仅需使用GWAS汇总统计量即可实现,且当SNP相关结构从大型参考面板中估计得到时,仍能保持良好的准确性。通过本方法优化精细定位研究的解析度,可在后续对可信集内变异进行功能注释、以明确人类疾病相关致病基因及通路的流程中,实现研究资源的更高效配置。
创建时间:
2020-04-13



