five

Supplementary Material for: Systematic Removal of Outliers to Reduce Heterogeneity in Case-Control Association Studies

收藏
DataCite Commons2020-09-02 更新2024-07-25 收录
下载链接:
https://karger.figshare.com/articles/dataset/Supplementary_Material_for_Systematic_Removal_of_Outliers_to_Reduce_Heterogeneity_in_Case-Control_Association_Studies/5121415
下载链接
链接失效反馈
官方服务:
资源简介:
<i>Background/Aims:</i> In human case-control association studies, population heterogeneity is often present and can lead to increased false-positive results. Various methods have been proposed and are in current use to remedy this situation. <i>Methods:</i> We assume that heterogeneity is due to a relatively small number of individuals whose allele frequencies differ from those of the remainder of the sample. For this situation, we propose a new method of handling heterogeneity by removing outliers in a controlled manner. In a coordinate system of the <i>c</i> largest principal components in multidimensional scaling (MDS), we systematically remove one after another of the most extreme outlying individuals and each time recompute the largest association test statistic. The smallest p value obtained within <i>M</i> removals serves as our test statistic whose significance level is assessed in randomization samples. <i>Results:</i> In power simulations of our method and three methods in current use, averaged over several different scenarios, the best method turned out to be logistic regression analysis (based on all individuals) with MDS components as covariates. <i>Conclusion:</i> Our proposed method ranked closely behind logistic regression analysis with MDS components but ahead of other commonly used approaches. In analyses of real datasets our method performed best.

<i>背景与研究目标</i>:在人类病例对照关联研究中,群体异质性(population heterogeneity)时常出现,且可能导致假阳性结果发生率升高。目前已有多种方法被提出并广泛应用于解决该问题。<i>研究方法</i>:我们假设异质性源于少数个体,其等位基因频率(allele frequencies)与样本中其余个体存在显著差异。针对该场景,我们提出一种全新的异质性处理方法:以受控方式移除异常个体。在基于多维标度(multidimensional scaling, MDS)分析得到的前c个最大主成分所构建的坐标系中,我们依次系统性移除最为极端的异常个体,并在每次移除后重新计算最大关联检验统计量。在M次移除操作中得到的最小p值将作为本研究的检验统计量,其显著性水平通过随机化样本进行评估。<i>研究结果</i>:在针对我们的方法与当前常用的三种方法的功效模拟实验中,综合多个不同研究场景的结果来看,表现最优的方法为以MDS主成分为协变量、基于全部个体的逻辑回归分析。<i>研究结论</i>:我们提出的方法排名紧随上述带MDS主成分协变量的逻辑回归分析,且优于其他常用方法。在真实数据集的分析中,本方法表现最佳。
提供机构:
Karger Publishers
创建时间:
2017-06-20
二维码
社区交流群
二维码
科研交流群
商业服务