Data from: Minimum sample sizes for population genomics: an empirical study from an Amazonian plant species
收藏DataONE2017-01-23 更新2024-06-26 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈官方服务:
资源简介:
High throughput DNA sequencing facilitates the analysis of large portions of the genome in non-model organisms, ensuring high accuracy of population genetic parameters. However, empirical studies evaluating the appropriate sample size for these kinds of studies are still scarce. In this study, we use double digest restriction associated DNA sequencing (ddRADseq) to recover thousands of single nucleotide polymorphisms (SNPs) for two physically isolated populations of Amphirrhox longifolia (Violaceae), a non-model plant species for which no reference genome is available. We used resampling techniques to construct simulated populations with a random subset of individuals and SNPs to determine how many individuals and bi-allelic markers should be sampled for accurate estimates of intra- and interpopulation genetic diversity. We identified 3,646 and 4,900 polymorphic SNPs for the two populations of A. longifolia, respectively. Our simulations show that, overall, a sample size greater than eight individuals has little impact on estimates of genetic diversity within A. longifolia populations, when 1,000 SNPs or higher are used. Our results also show that even at a very small sample size (i.e., two individuals), accurate estimates of FST can be obtained with a large number of SNPs (≥ 1,500). These results highlight the potential of high-throughput genomic sequencing approaches to address questions related to evolutionary biology in non-model organisms. Furthermore, our findings also provide insights into the optimization of sampling strategies in the era of population genomics.
高通量DNA测序(High throughput DNA sequencing)可助力解析非模式生物的大片段基因组,保障种群遗传参数估算的高准确性。然而,针对此类研究的合理样本量开展的实证评估仍较为匮乏。本研究采用双酶切限制性位点关联DNA测序(double digest restriction associated DNA sequencing,ddRADseq),为两个地理隔离的长叶两蕊堇(Amphirrhox longifolia,堇菜科Violaceae)种群鉴定得到数千个单核苷酸多态性位点(single nucleotide polymorphisms, SNPs)——该物种为尚无参考基因组的非模式植物。我们采用重采样技术构建包含随机抽取个体与SNP子集的模拟种群,以确定需采集多少个体与双等位基因标记,方可准确估算种群内与种群间的遗传多样性。我们分别为这两个A. longifolia种群鉴定得到3646与4900个多态性SNPs。模拟结果显示,整体而言,当使用1000个及以上SNPs时,样本量超过8个个体对长叶两蕊堇种群内遗传多样性的估算几乎无影响。研究结果还表明,即便样本量极小(即仅2个个体),只要拥有大量SNPs(≥1500个),即可获得准确的FST估算值。本研究结果凸显了高通量基因组测序技术在解析非模式生物进化生物学相关问题上的应用潜力。此外,我们的发现也为种群基因组学时代的采样策略优化提供了重要参考依据。
创建时间:
2017-01-23



