Data from: How does ascertainment bias in SNP analyses affect inferences about population history?
收藏DataONE2015-04-08 更新2024-06-27 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈官方服务:
资源简介:
Background: The selection of variable sites for inclusion in genomic analyses can influence results, especially when exemplar populations are used to determine polymorphic sites. We tested the impact of ascertainment bias on the inference of population genetic parameters using empirical and simulated data representing the three major continental groups of cattle: European, African, and Indian. We simulated data under three demographic models. Each simulated data set was subjected to three ascertainment schemes: (I) random selection; (II) geographically biased selection; and (III) selection biased toward loci polymorphic in multiple groups. Empirical data comprised samples of 25 individuals representing each continental group. These cattle were genotyped for 47,506 loci from the bovine 50 K SNP panel. We compared the inference of population histories for the empirical and simulated data sets across different ascertainment conditions using FST and principal components analysis (PCA). Results: Bias toward shared polymorphism across continental groups is apparent in the empirical SNP data. Bias toward uneven levels of within-group polymorphism decreases estimates of F ST between groups. Subpopulation-biased selection of SNPs changes the weighting of principal component axes and can affect inferences about proportions of admixture and population histories using PCA. PCA-based inferences of population relationships are largely congruent across types of ascertainment bias, even when ascertainment bias is strong. Conclusions: Analyses of ascertainment bias in genomic data have largely been conducted on human data. As genomic analyses are being applied to non-model organisms, and across taxa with deeper divergences, care must be taken to consider the potential for bias in ascertainment of variation to affect inferences. Estimates of FST, time of separation, and population divergence as estimated by principal components analysis can be misleading if this bias is not taken into account.
研究背景:基因组分析中可变位点的选取策略会对分析结果产生显著影响,尤其当采用代表性种群确定多态性位点时。本研究以覆盖欧洲、非洲、印度三大主要大陆牛群的实测与模拟数据为对象,检验了ascertainment偏倚(ascertainment bias)对群体遗传参数推断的影响。我们基于三种种群历史模型模拟生成数据,并对每个模拟数据集分别采用三种位点选取偏倚方案:(I)随机选取;(II)地理偏向性选取;(III)偏向于在多个类群中均表现为多态的位点选取。实测数据包含每个大陆类群的25个个体样本,这些牛只通过牛50K SNP基因分型芯片(bovine 50 K SNP panel)进行基因分型,共获得47506个位点。我们采用FST统计量与主成分分析(Principal Components Analysis, PCA),对比了不同位点选取偏倚条件下实测与模拟数据集的种群历史推断结果。
研究结果:在实测单核苷酸多态性(Single Nucleotide Polymorphism, SNP)数据中,可观察到跨大陆类群共享多态性的偏倚现象。偏向于类群内多态性水平不均一的位点选取方式,会降低类群间FST的估计值。基于亚群偏向性的SNP选取策略会改变主成分轴的权重,并可能影响基于PCA的群体基因交流比例与种群历史的推断。尽管ascertainment偏倚较强,但基于PCA的种群关系推断在不同类型的偏倚下整体保持一致。
研究结论:目前针对基因组数据中ascertainment偏倚的分析大多以人类数据为基础。随着基因组分析逐步应用于非模式生物以及分化程度更深的类群,必须充分考量变异位点选取偏倚对推断结果的潜在影响。若未对这类偏倚进行校正,基于主成分分析得到的FST、种群分化时间以及群体分化程度的估计值可能产生误导。
创建时间:
2015-04-08



