Data from: Subsampling reveals that unbalanced sampling affects STRUCTURE results in a multi-species dataset
收藏DataCite Commons2025-04-01 更新2025-04-09 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.nh4366s
下载链接
链接失效反馈官方服务:
资源简介:
Studying the genetic population structure of species can reveal important
insights into several key evolutionary, historical, demographic, and
anthropogenic processes. One of the most important statistical tools for
inferring genetic clusters is the program STRUCTURE. Recently, several
papers have pointed out that STRUCTURE may show a bias when the sampling
design is unbalanced, resulting in spurious joining of underrepresented
populations and spurious separation of overrepresented populations.
Suggestions to overcome this bias include subsampling and changing the
ancestry model, but the performance of these two methods has not yet been
tested on actual data. Here, I use a dataset of twelve high-alpine plant
species to test whether unbalanced sampling affects the STRUCTURE
inference of population differentiation between the European Alps and the
Carpathians. For four of the twelve species, subsampling of the Alpine
populations –to match the sample size between the Alps and the
Carpathians– resulted in a drastically different clustering than the full
dataset. On the other hand, STRUCTURE results with the alternative
ancestry model were indistinguishable from the results with the default
model. Based on these results, the subsampling strategy seems a more
viable approach to overcome the bias than the alternative ancestry model.
However, subsampling is only possible when there is an a priori
expectation of what constitute the main clusters. Though these results do
not mean that the use of STRUCTURE should be discarded, it does indicate
that users of the software should be cautious about the interpretation of
the results when sampling is unbalanced.
提供机构:
Dryad
创建时间:
2018-07-10



