Data from: Identifying consistent allele frequency differences in studies of stratified populations

DataONE2017-05-17 更新2024-06-26 收录

下载链接：

https://search.dataone.org/view/null

下载链接

链接失效反馈

官方服务：

资源简介：

1. With increasing application of pooled-sequencing approaches to population genomics robust methods are needed to accurately quantify allele frequency differences between populations. Identifying consistent differences across stratified populations can allow us to detect genomic regions under selection and that differ between populations with different histories or attributes. Current popular statistical tests are easily implemented in widely available software tools which make them simple for researchers to apply. However, there are potential problems with the way such tests are used ,which means that underlying assumptions about the data are frequently violated. 2. These problems are highlighted by simulation of simple but realistic population genetic models of neutral evolution and the performance of different tests are assessed. We present alternative tests (including GLMs with quasibinomial error structure) with attractive properties for the analysis of allele frequency differences and re-analyse a published dataset. 3. The simulations show that common statistical tests for consistent allele frequency differences perform poorly, with high false positive rates. Applying tests that do not confound heterogeneity and main effects significantly improves inference. Variation in sequencing coverage likely produces many false positives and re-scaling allele frequencies to counts out of a common value or an effective sample size reduces this effect. 4. Many researchers are interested in identifying allele frequencies that vary consistently across replicates to identify loci underlying phenotypic responses to selection or natural variation in phenotypes. Popular methods that have been suggested for this task perform poorly in simulations. Overall, quasibinomial GLMs perform better and also have the attractive feature of allowing correction for multiple testing by standard procedures and are easily extended to other designs.

1. 随着混合测序（Pooled Sequencing）方法在群体基因组学（Population Genomics）中的应用日益广泛，亟需开发稳健的方法以精准量化不同群体间的等位基因频率（Allele Frequency）差异。识别分层群体（Stratified Populations）间的一致性差异，可帮助我们检测受选择作用的基因组区域，以及在演化历史或自身特征各异的群体间存在差异的基因组区域。当前主流的统计检验（Statistical Tests）可通过广泛可用的软件工具轻松实现，便于研究人员应用。然而，这类检验的实际使用方式存在潜在问题，致使其关于数据的底层假设常被违背。 2. 通过构建简单却贴合实际的中性演化（Neutral Evolution）群体遗传模型开展模拟实验，可凸显上述问题，并评估不同统计检验的性能。本文提出了若干具备优良特性的替代检验方法（包括带有准二项误差结构的广义线性模型（Generalized Linear Models, GLMs）），用于等位基因频率差异分析，并对一份已发表数据集（Published Dataset）进行了重新分析。 3. 模拟结果显示，用于检测一致性等位基因频率差异的主流统计检验表现不佳，假阳性率（False Positive Rate）偏高。采用不会混淆异质性与主效应的检验方法，可显著提升统计推断的准确性。测序覆盖度（Sequencing Coverage）的差异大概率会产生大量假阳性结果，将等位基因频率重新缩放为基于统一基准值或有效样本量（Effective Sample Size）的计数形式，可有效缓解这一问题。 4. 诸多研究人员希望通过识别重复实验间具有一致性差异的等位基因频率，来定位介导表型对选择的响应或表型自然变异的基因座（Loci）。此前针对该任务提出的主流方法在模拟实验中表现欠佳。总体而言，带准二项误差结构的广义线性模型表现更优，其优势在于可通过标准流程（Standard Procedures）校正多重检验（Multiple Testing），且易于拓展至其他实验设计方案。

创建时间：

2017-05-17

5,000+

优质数据集

54 个

任务类型

进入经典数据集