Simultaneous Detection of Signal Regions Using Quadratic Scan Statistics With Applications to Whole Genome Association Studies
收藏DataCite Commons2022-06-08 更新2024-07-28 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Simultaneous_Detection_of_Signal_Regions_Using_Quadratic_Scan_Statistics_With_Applications_to_Whole_Genome_Association_Studies/12951696/1
下载链接
链接失效反馈官方服务:
资源简介:
We consider in this article detection of signal regions associated with disease outcomes in whole genome association studies. Gene- or region-based methods have become increasingly popular in whole genome association analysis as a complementary approach to traditional individual variant analysis. However, these methods test for the association between an outcome and the genetic variants in a prespecified region, for example, a gene. In view of massive intergenic regions in whole genome sequencing (WGS) studies, we propose a computationally efficient quadratic scan (Q-SCAN) statistic based method to detect the existence and the locations of signal regions by scanning the genome continuously. The proposed method accounts for the correlation (linkage disequilibrium) among genetic variants, and allows for signal regions to have both causal and neutral variants, and the effects of signal variants to be in different directions. We study the asymptotic properties of the proposed Q-SCAN statistics. We derive an empirical threshold that controls for the family-wise error rate, and show that under regularity conditions the proposed method consistently selects the true signal regions. We perform simulation studies to evaluate the finite sample performance of the proposed method. Our simulation results show that the proposed procedure outperforms the existing methods, especially when signal regions have causal variants whose effects are in different directions, or are contaminated with neutral variants. We illustrate Q-SCAN by analyzing the WGS data from the Atherosclerosis Risk in Communities study. Supplementary materials for this article are available online.
本文聚焦全基因组关联研究(whole genome association studies)中疾病结局相关信号区域的检测问题。基于基因或区域的分析方法,已成为全基因组关联分析中与传统单变异分析互为补充的主流手段,应用日益广泛。不过,此类方法仅针对预设区域(如基因区域)内的遗传变异与研究结局的关联开展检验。鉴于全基因组测序(WGS)研究中存在大量基因间区域,本文提出一种基于计算高效二次扫描(Q-SCAN)统计量的检测方法,通过连续扫描基因组以识别信号区域的存在性与具体位置。所提方法可考量遗传变异间的连锁不平衡(linkage disequilibrium)相关性,允许信号区域同时包含因果变异与中性变异,且信号变异的效应方向可互不相同。本文推导了Q-SCAN统计量的渐近性质,得到了可控制家族式错误率(family-wise error rate)的经验阈值,并证明在正则条件下,所提方法可一致识别真实信号区域。通过模拟研究评估了所提方法的有限样本性能,模拟结果显示,相较于现有方法,本文所提流程表现更优,尤其当信号区域内存在效应方向各异的因果变异,或区域被中性变异污染时,优势更为突出。本文以社区动脉粥样硬化风险研究的全基因组测序数据为例,对Q-SCAN方法进行了实证演示。本文的补充材料可在线获取。
提供机构:
Taylor & Francis
创建时间:
2020-09-14



