Accounting for Population Stratification in Practice: A Comparison of the Main Strategies Dedicated to Genome-Wide Association Studies

Figshare2016-01-18 更新2026-04-29 收录

下载链接：

https://figshare.com/articles/dataset/Accounting_for_Population_Stratification_in_Practice_A_Comparison_of_the_Main_Strategies_Dedicated_to_Genome_Wide_Association_Studies/130410

下载链接

链接失效反馈

官方服务：

资源简介：

Genome-Wide Association Studies are powerful tools to detect genetic variants associated with diseases. Their results have, however, been questioned, in part because of the bias induced by population stratification. This is a consequence of systematic differences in allele frequencies due to the difference in sample ancestries that can lead to both false positive or false negative findings. Many strategies are available to account for stratification but their performances differ, for instance according to the type of population structure, the disease susceptibility locus minor allele frequency, the degree of sampling imbalanced, or the sample size. We focus on the type of population structure and propose a comparison of the most commonly used methods to deal with stratification that are the Genomic Control, Principal Component based methods such as implemented in Eigenstrat, adjusted Regressions and Meta-Analyses strategies. Our assessment of the methods is based on a large simulation study, involving several scenarios corresponding to many types of population structures. We focused on both false positive rate and power to determine which methods perform the best. Our analysis showed that if there is no population structure, none of the tests led to a bias nor decreased the power except for the Meta-Analyses. When the population is stratified, adjusted Logistic Regressions and Eigenstrat are the best solutions to account for stratification even though only the Logistic Regressions are able to constantly maintain correct false positive rates. This study provides more details about these methods. Their advantages and limitations in different stratification scenarios are highlighted in order to propose practical guidelines to account for population stratification in Genome-Wide Association Studies.

全基因组关联研究（Genome-Wide Association Studies）是检测与疾病相关遗传变异的强有力工具。然而，其研究结果一直受到质疑，这在一定程度上源于人群分层（population stratification）所引发的偏倚。人群分层是指因样本祖先背景存在差异，导致等位基因频率出现系统性差异，该现象可能会引发假阳性或假阴性的研究结果。目前已有多种策略可用于校正人群分层，但不同策略的性能表现存在差异，例如取决于人群结构类型、疾病易感位点的次要等位基因频率、抽样不均衡程度以及样本量大小。本研究聚焦于人群结构类型这一影响因素，对当前最常用于校正分层的方法展开对比分析，包括基因组控制（Genomic Control）、基于主成分的方法（如Eigenstrat所实现的分析方案）、校正回归以及荟萃分析（Meta-Analyses）策略。本次方法评估基于一项大规模模拟研究，涵盖了对应多种人群结构类型的多种模拟场景。我们同时以假阳性率与检验效能作为核心评估指标，以确定哪些方法的表现最为优异。分析结果表明，在不存在人群分层的场景下，除荟萃分析外，其余所有检验方法均不会引入偏倚，也不会降低检验效能。当人群存在分层时，校正逻辑回归与Eigenstrat是校正人群分层的最优方案，尽管仅有校正逻辑回归能够持续维持准确的假阳性率。本研究对上述方法进行了更为详尽的阐释，重点突出了不同分层场景下各类方法的优势与局限性，以期为全基因组关联研究中人群分层的校正提供实用指导准则。

创建时间：

2016-01-18