five

Genotypes from two wood ant species and three hybrid populations

收藏
Figshare2022-08-10 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/Genotypes_from_two_wood_ant_species_and_three_hybrid_populations/20464383/1
下载链接
链接失效反馈
官方服务:
资源简介:
This VCF file contains 1,659,532 single nucleotide polymorphisms (SNPs) genotyped across 59 wood ant individuals, please see the reference below for more details (including metadata). Whole-genome sequencing was carried out on Illumina Novaseq 6000 (150 base pairs, paired-end reads), targeting 15× per individual. We trimmed raw Illumina reads and adapter sequences with TRIMMOMATIC (v0.38; parameters LEADING:3, TRAILING:3, MINLEN:36) and mapped trimmed reads against the reference genome (10.1093/jhered/esac019) using BWA MEM (v0.7.17). We then removed duplicates using PICARD TOOLS (v2.21.4). SNPs were then jointly called across all samples with FREEBAYES (v1.3.1, population priors disabled with -k option) and the resulting VCF file was normalized using VT (v0.5). We excluded both sites located at less than two base pairs from indels and sites supported by only Forward or Reverse reads using BCFTOOLS (v1.10). We decomposed multi-nucleotide variants using vcfallelicprimitives from VCFLIB (v1.0.1). The next steps were carried out with BCFTOOLS. Biallelic SNPs with quality equal or higher than 30 were kept. Individual genotypes with (<em>i</em>) genotype qualities lower than 30 and/or (<em>ii</em>) with depth of coverage lower than eight were coded as missing data. Sites displaying more than 50% missing data over all samples were discarded. Genotyping errors due to e.g., misaligned reads were removed using a filter based on excessive heterozygosity. To do so, we used an approach similar to Pfeifer <em>et al</em>. (2018) and excluded sites displaying heterozygote excess after pooling all samples (P &lt; 0.01, --hardy command from VCFTOOLS v0.1.16). We then filtered sites based on individual sequencing depth distributions, setting as missing sites where depth was lower than half or higher than twice the mean value of the individual considered. Finally, sites with more than 15% missing data over all samples were discarded.
提供机构:
SpecIAnt
创建时间:
2022-08-10
二维码
社区交流群
二维码
科研交流群
商业服务