Genetic Variant Set-Based Tests Using the Generalized Berk–Jones Statistic With Application to a Genome-Wide Association Study of Breast Cancer
收藏DataCite Commons2021-09-29 更新2024-07-28 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Genetic_Variant_Set-Based_Tests_Using_the_Generalized_Berk_Jones_Statistic_With_Application_to_a_Genome-Wide_Association_Study_of_Breast_Cancer/9816446/3
下载链接
链接失效反馈官方服务:
资源简介:
Studying the effects of groups of single nucleotide polymorphisms (SNPs), as in a gene, genetic pathway, or network, can provide novel insight into complex diseases such as breast cancer, uncovering new genetic associations and augmenting the information that can be gleaned from studying SNPs individually. Common challenges in set-based genetic association testing include weak effect sizes, correlation between SNPs in a SNP-set, and scarcity of signals, with individual SNP effects often ranging from extremely sparse to moderately sparse in number. Motivated by these challenges, we propose the Generalized Berk–Jones (GBJ) test for the association between a SNP-set and outcome. The GBJ extends the Berk–Jones statistic by accounting for correlation among SNPs, and it provides advantages over the Generalized Higher Criticism test when signals in a SNP-set are moderately sparse. We also provide an analytic <i>p</i>-value calculation for SNP-sets of any finite size, and we develop an omnibus statistic that is robust to the degree of signal sparsity. An additional advantage of our work is the ability to conduct inference using individual SNP summary statistics from a genome-wide association study (GWAS). We evaluate the finite sample performance of the GBJ through simulation and apply the method to identify breast cancer risk genes in a GWAS conducted by the Cancer Genetic Markers of Susceptibility Consortium. Our results suggest evidence of association between FGFR2 and breast cancer and also identify other potential susceptibility genes, complementing conventional SNP-level analysis. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.
研究基因、遗传通路或调控网络中的单核苷酸多态性(single nucleotide polymorphisms, SNPs)集合的效应,可为乳腺癌等复杂疾病提供全新的研究视角,发掘新的遗传关联,并扩充通过单独研究单个SNPs所能获取的信息。基于集合的遗传关联检验常面临三类核心挑战:效应量微弱、SNP集合内的SNPs间存在相关性,以及信号稀缺——单个SNP的效应数量通常介于极稀疏到中度稀疏之间。针对上述挑战,我们提出了用于检验SNP集合与表型结局间关联的广义伯克-琼斯(Generalized Berk–Jones, GBJ)检验。GBJ通过考量SNPs间的相关性拓展了伯克-琼斯统计量,且当SNP集合内的信号呈中度稀疏时,其性能优于广义高批评检验。此外,我们针对任意有限大小的SNP集合提供了解析p值计算方法,并开发了对信号稀疏程度具有鲁棒性的综合统计量。本研究的另一项优势在于,可利用全基因组关联研究(genome-wide association study, GWAS)中获取的单个SNP汇总统计量开展推断。我们通过模拟实验评估了GBJ的有限样本性能,并将该方法应用于癌症遗传易感标志物联盟开展的GWAS中,以识别乳腺癌风险基因。研究结果证实了成纤维细胞生长因子受体2(FGFR2)与乳腺癌之间存在关联,同时还发掘出其他潜在易感基因,对传统的SNP水平分析形成了有效补充。本文的补充材料(包含可用于复现研究的标准化材料说明)可在线获取。
提供机构:
Taylor & Francis
创建时间:
2021-09-29



