Erratum: A New Expectation-Maximization Statistical Test for Case-Control Association Studies Considering Rare Variants Obtained by High-Throughput Sequencing

Name: Erratum: A New Expectation-Maximization Statistical Test for Case-Control Association Studies Considering Rare Variants Obtained by High-Throughput Sequencing
Creator: Karger Publishers
Published: 2020-09-01 13:26:36
License: 暂无描述

DataCite Commons2020-09-01 更新2024-07-25 收录

下载链接：

https://karger.figshare.com/articles/dataset/Erratum_A_New_Expectation-Maximization_Statistical_Test_for_Case-Control_Association_Studies_Considering_Rare_Variants_Obtained_by_High-Throughput_Sequencing/5241280/1

下载链接

链接失效反馈

官方服务：

资源简介：

Genome-wide association studies (GWAS) have been successful in identifying common genetic variation reproducibly associated with disease. However, most associated variants confer very small risk and after meta-analysis of large cohorts a large fraction of expected heritability still remains unexplained. A possible explanation is that rare variants currently undetected by GWAS with SNP arrays could contribute a large fraction of risk when present in cases. This concept has spurred great interest in exploring the role of rare variants in disease. As the cost of sequencing continue to plummet, it is becoming feasible to directly sequence case-control samples for testing disease association including rare variants. We have developed a test statistic that allows for association testing among cases and controls using data directly from sequencing reads. In addition, our method allows for random errors in reads. We determine the probability of a true genotype call based on the observed base pair reads using the expectation-maximization algorithm. We apply the SumStat procedure to obtain a single statistic for a group of multiple rare variant loci. We document the validity of our method through simulations. Our results suggest that our statistic maintains the correct type I error rate, even in the presence of differential misclassification for sequence reads, and that it has good power under a number of scenarios. Finally, our SumStat results show power at least as good as the maximum single locus results.

全基因组关联研究（Genome-wide association studies, GWAS）已成功可重复地鉴定出与疾病相关的常见遗传变异。然而，绝大多数关联变异仅带来极微弱的疾病风险，且在对大型队列开展荟萃分析（meta-analysis）后，仍有大量预期遗传力（heritability）未能得到解释。一种可能的解释是，当前通过单核苷酸多态性芯片（SNP arrays）开展的GWAS未能检测到的罕见变异，若存在于病例样本中，可能贡献了相当比例的疾病风险。这一观点引发了学界对探索罕见变异在疾病中作用的广泛关注。随着测序成本持续骤降，直接对病例-对照样本（case-control samples）开展测序以检测包括罕见变异在内的疾病关联已逐渐具备可行性。本研究开发了一种检验统计量，可直接利用测序读段（sequencing reads）数据对病例与对照开展关联分析。此外，该方法可兼容测序读段中的随机误差。我们基于观测到的碱基读段，通过期望-最大化算法（expectation-maximization algorithm）推算真实基因型分型的概率。我们采用SumStat流程，为一组多个罕见变异位点生成单一综合统计量。我们通过模拟实验验证了该方法的有效性。研究结果显示，即便测序读段存在差异性错分，我们的统计量仍能维持准确的一类错误率（type I error rate），且在多种场景下均具备良好的检验效能（power）。最后，SumStat的分析结果表明，其检验效能至少不低于单个位点的最优检验结果。

提供机构：

Karger Publishers

创建时间：

2017-07-25

5,000+

优质数据集

54 个

任务类型

进入经典数据集