five

filtered SNP set used for population structure and clustering analysis

收藏
DataONE2018-02-09 更新2024-06-25 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈
官方服务:
资源简介:
These SNPs were called from 109 Metchnikowia sp whole genomes mapped to MR_a10 reference genome using GATK best practices guidelines. Hard filtering of initial SNPs was performed using the GATK variant filtration tool (v3.4) and VCFtools (v1.5) as per best practices (Danecek et al. 2011), using the following parameters: base quality = 20, quality by depth = 2.0, mapping quality = 30, Fisher strand bias = 60, mapping quality rank sum =-12.5, and ReadPosRankSum = -8.0. Post InDel removal, the SNP set consisting 1.27 million SNPs across 109 strains was further filtered to exclude: non-bi-allelic SNPs, a minor allele frequency below 0.05 and polymorphisms with more than 50% missing data. To resolve SNPs in linkage, a window size of 50 SNPs advanced by 5 SNPS at a time and an r2 threshold of 0.5 was used. The final set of high confidence SNPs consisted of 88, 192 polymorphisms. See attached scripts.
创建时间:
2018-02-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作