Data from: Properties of different selection signature statistics and a new strategy for combining them
收藏DataONE2015-04-09 更新2024-06-27 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈官方服务:
资源简介:
Identifying signatures of recent or ongoing selection is of high relevance in livestock population genomics. From a statistical perspective, determining a proper testing procedure and combining various test statistics is challenging. On the basis of extensive simulations in this study, we discuss the statistical properties of eight different established selection signature statistics. In the considered scenario, we show that a reasonable power to detect selection signatures is achieved with high marker density (>1 SNP/kb) as obtained from sequencing, while rather small sample sizes (~15 diploid individuals) appear to be sufficient. Most selection signature statistics such as composite likelihood ratio and cross population extended haplotype homozogysity have the highest power when fixation of the selected allele is reached, while integrated haplotype score has the highest power when selection is ongoing. We suggest a novel strategy, called de-correlated composite of multiple signals (DCMS) to combine different statistics for detecting selection signatures while accounting for the correlation between the different selection signature statistics. When examined with simulated data, DCMS consistently has a higher power than most of the single statistics and shows a reliable positional resolution. We illustrate the new statistic to the established selective sweep around the lactase gene in human HapMap data providing further evidence of the reliability of this new statistic. Then, we apply it to scan selection signatures in two chicken samples with diverse skin color. Our analysis suggests that a set of well-known genes such as BCO2, MC1R, ASIP and TYR were involved in the divergent selection for this trait.
在畜禽群体基因组学研究中,鉴定近期或正在发生的选择印记具有重要的学术价值。从统计学视角来看,确定适配的检验流程并整合多种检验统计量颇具挑战。本研究基于大量模拟实验,探讨了8种已被广泛应用的选择印记统计量的统计学特性。在本研究设定的分析场景中,我们发现:采用测序技术获得的高标记密度(>1个单核苷酸多态性(Single Nucleotide Polymorphism, SNP)/千碱基对)可实现合理的选择印记检测效能,而仅需较小的样本量(约15个二倍体个体)即可满足分析需求。多数选择印记统计量(如复合似然比、跨群体扩展单倍型纯合性)在选择等位基因达到固定状态时具备最高检测效能,而整合单倍型得分则在选择仍处于进行阶段时展现出最优效能。我们提出了一种名为去相关多信号复合检验(de-correlated composite of multiple signals, DCMS)的全新策略,用于整合多种选择印记检测统计量,同时考量不同选择印记统计量之间的相关性。通过模拟数据验证可知,DCMS的检测效能始终优于多数单一统计量,且具备可靠的位点定位分辨率。我们将该新型统计量应用于人类基因组单体型图(HapMap)数据中乳糖酶基因周边已被证实的选择性清除区域,进一步验证了该新型统计量的可靠性。随后,我们将其应用于两个肤色存在差异的鸡群体样本的选择印记扫描分析中。本研究分析表明,BCO2、MC1R、ASIP及TYR等一系列经典基因均参与了该性状相关的分歧选择过程。
创建时间:
2015-04-09



