Data from: Minor allele frequency thresholds strongly affect population structure inference with genomic datasets
收藏DataCite Commons2025-06-01 更新2025-06-15 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.hr1hh75
下载链接
链接失效反馈官方服务:
资源简介:
One common method of minimizing errors in large DNA sequence datasets is
to drop variable sites with a minor allele frequency below some specified
threshold. Though widespread, this procedure has the potential to alter
downstream population genetic inferences and has received relatively
little rigorous analysis. Here we use simulations and an empirical SNP
dataset to demonstrate the impacts of minor allele frequency (MAF)
thresholds on inference of population structure. We find that model-based
inference of population structure is confounded when singletons are
included in the alignment, and that both model-based and multivariate
analyses infer less distinct clusters when more stringent MAF cutoffs are
applied. We propose that this behavior is caused by the combination of a
drop in the total size of the data matrix and by correlations between
allele frequencies and mutational age. We recommend a set of best
practices for applying MAF filters in studies seeking to describe
population structure with genomic data.
提供机构:
Dryad
创建时间:
2019-02-04



