five

Data from: Identifying consistent allele frequency differences in studies of stratified populations

收藏
DataONE2017-05-17 更新2024-06-26 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈
官方服务:
资源简介:
1. With increasing application of pooled-sequencing approaches to population genomics robust methods are needed to accurately quantify allele frequency differences between populations. Identifying consistent differences across stratified populations can allow us to detect genomic regions under selection and that differ between populations with different histories or attributes. Current popular statistical tests are easily implemented in widely available software tools which make them simple for researchers to apply. However, there are potential problems with the way such tests are used ,which means that underlying assumptions about the data are frequently violated. 2. These problems are highlighted by simulation of simple but realistic population genetic models of neutral evolution and the performance of different tests are assessed. We present alternative tests (including GLMs with quasibinomial error structure) with attractive properties for the analysis of allele frequency differences and re-analyse a published dataset. 3. The simulations show that common statistical tests for consistent allele frequency differences perform poorly, with high false positive rates. Applying tests that do not confound heterogeneity and main effects significantly improves inference. Variation in sequencing coverage likely produces many false positives and re-scaling allele frequencies to counts out of a common value or an effective sample size reduces this effect. 4. Many researchers are interested in identifying allele frequencies that vary consistently across replicates to identify loci underlying phenotypic responses to selection or natural variation in phenotypes. Popular methods that have been suggested for this task perform poorly in simulations. Overall, quasibinomial GLMs perform better and also have the attractive feature of allowing correction for multiple testing by standard procedures and are easily extended to other designs.

1. 随着混合测序(pooled-sequencing)技术在群体基因组学中的应用日益广泛,亟需稳健的方法来精准量化不同群体间的等位基因频率差异。识别分层群体间的一致性差异,有助于检测受选择作用的基因组区域,以及具有不同演化历史或特征的群体间存在差异的区域。当前主流的统计检验可通过广泛可用的软件工具轻松实现,便于研究人员应用。但这类检验的实际应用方式存在潜在问题,导致其关于数据的底层假设常被违反。 2. 通过构建简单且贴合实际的中性演化群体遗传模型进行模拟,并评估不同检验方法的性能,可凸显上述问题。本文提出了适用于等位基因频率差异分析的备选检验方法(包括带有准二项误差结构的广义线性模型(Generalized Linear Models, GLMs)),这些方法具备优良特性,并对已公开的数据集进行了重新分析。 3. 模拟结果显示,用于检测一致性等位基因频率差异的主流统计检验方法表现不佳,假阳性率较高。采用不会混淆异质性与主效应的检验方法,可显著提升统计推断效果。测序覆盖度的变异可能产生大量假阳性,将等位基因频率按统一基准值或有效样本量进行归一化处理,可缓解这一问题。 4. 诸多研究人员希望通过检测重复实验间具有一致性变异的等位基因频率,来筛选与表型对选择的响应或表型自然变异相关的基因座。此前针对该任务提出的主流方法在模拟实验中表现不佳。总体而言,准二项广义线性模型表现更优,且具备可通过标准流程校正多重检验的优良特性,同时易于扩展至其他实验设计场景。
创建时间:
2017-05-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作