filtered SNP set used for population structure and clustering analysis
收藏DataONE2018-02-09 更新2024-06-25 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈官方服务:
资源简介:
These SNPs were called from 109 Metchnikowia sp whole genomes mapped to MR_a10 reference genome using GATK best practices guidelines. Hard filtering of initial SNPs was performed using the GATK variant filtration tool (v3.4) and VCFtools (v1.5) as per best practices (Danecek et al. 2011), using the following parameters: base quality = 20, quality by depth = 2.0, mapping quality = 30, Fisher strand bias = 60, mapping quality rank sum =-12.5, and ReadPosRankSum = -8.0. Post InDel removal, the SNP set consisting 1.27 million SNPs across 109 strains was further filtered to exclude: non-bi-allelic SNPs, a minor allele frequency below 0.05 and polymorphisms with more than 50% missing data. To resolve SNPs in linkage, a window size of 50 SNPs advanced by 5 SNPS at a time and an r2 threshold of 0.5 was used. The final set of high confidence SNPs consisted of 88, 192 polymorphisms. See attached scripts.
创建时间:
2018-02-09



