Genomic variant data and codes used for analysis in the manuscript - Whole genome sequencing reveals the structure of environment associated divergence in a broadly distributed montane bumble bee, Bombus vancouverensis

Name: Genomic variant data and codes used for analysis in the manuscript - Whole genome sequencing reveals the structure of environment associated divergence in a broadly distributed montane bumble bee, Bombus vancouverensis
Creator: Heraghty, Sam
Published: 2022-07-14 00:00:00
License: 暂无描述

Figshare2022-07-14 更新2026-04-08 收录

下载链接：

https://figshare.com/articles/dataset/b_vanc_fully_filtered_100k_plus_recode_vcf_gz/20310522/2

下载链接

链接失效反馈

官方服务：

资源简介：

See below for details of the files included below. delly_vanc.vcf.gz # Raw output of Delly b.vanc.fully.filtered.100k.plus.recode.vcf.gz # output of freebayes which was filtered using VCFtools v0.1.13 (Danecek et al. 2011) with the following flags: --remove-indels --min-alleles 2 --max-alleles 2 --minQ 20 --minDP 4 --max-missing 0.75 #Above file was also filtered to remove sites with unusually high coverage (>2x mean coverage) or excess heterozygosity. Finally SNPs that fell on scaffolds less than 100kb in length were removed b.vanc.fully.filtered.100k.plus.recode.maf05.recode.ANN.vcf.gz #Fully filtered variant file (see manuscript for details) with annotation information b.vanc.fully.filtered.100k.plus.recode.maf05.recode.impute.vcf.gz #Fully filtered variant file (see manuscript for details) after imputation with beagle #### Description of each script contained in this directory #### Trim_N_QC.sh #Trim raw sequencing data and run fastQC to evaluate trimmed data BWA_PICARD_vanc1.sh #Example of script used to align sequence data to the reference genome using BWA. Also, uses Picard tools to sort, deduplicate and index bam files P_call_test-2-vanc.sh #First part of pipeline for calling SNPS with freebayes (calls freebayes-parallel-part1_vanc.sh) freebayes-parallel-part1_vanc.sh #see above Filter_vanc.sh #Create list of SV's to filter from DELLY output filter_delly.sh #filter based on generated list of SV's delly_vanc.sh #call SV's using DELLY bcf2vcf.sh # convert bcf from DELLY to vcf format freebayes-parallel-part2.sh #Second part of freebayes pipeline merge_vanc_vars.sh #Second part of freebayes pipeline (calls freebayes-parallel-part2.sh) site_depth_vanc.sh #Gets site depth per SNP remove_highdepth_vanc.sh #removes SNPs above depth threshold hardy_vanc.sh #calculates HWE per SNP remove_hwe_vanc.sh #removes SNPs based on HWE threshold filter_vcf_size.sh #Removes SNPs on scaffolds less than 100Kb in size filter_vcf_maf05.sh #filters SNPs based on 5% MAF filter beagle.sh #imputes using beagle LEA_con.R #converts vcf file into LFMM and geno format Snpeff_ANN.sh # annotate vcf file using SNPeff plink_for_sambaR.sh # convert vcf file into format ready for use in sambaR LD_test.sh #example of script used to calculate LD per scaffold vcf_stats.sh #Gets various stats from final filtered vcf get_pi_diversity.sh #gets per population nucleotide diversity sambaR.R #Runs SambaR lfmm2_analysis.R #Code for running analysis on output of LFMM2 and generating graphs Max_ent_map.R #Generates maxent map RDA_script.R #Code for RDA analysis of structural variants snprelate_script.R #runs SNPrelate as well as makes graphs of Fst and pi along scaffolds of interest repeat_correctedfst.R #Analysis for correlation between repeat density and Fst LD_script.R #analysis of linkage

以下为本数据集包含的所有文件详情： delly_vanc.vcf.gz # Delly 软件的原始输出结果 b.vanc.fully.filtered.100k.plus.recode.vcf.gz # 经VCFtools v0.1.13（Danecek等，2011）过滤后的freebayes变异呼叫结果，所用过滤参数如下：--remove-indels（移除插入缺失变异）、--min-alleles 2（最小等位基因数为2）、--max-alleles 2（最大等位基因数为2）、--minQ 20（最低测序质量值为20）、--minDP 4（最低测序深度为4）、--max-missing 0.75（最大缺失率为0.75）。该文件还额外经过过滤，移除了测序深度异常偏高（大于平均深度2倍）或杂合性过剩的位点；最终移除了位于长度小于100kb的序列支架（scaffold）上的单核苷酸多态性（Single Nucleotide Polymorphism, SNP）位点 b.vanc.fully.filtered.100k.plus.recode.maf05.recode.ANN.vcf.gz # 经过全流程过滤的变异位点文件（详细信息参见论文），包含注释信息 b.vanc.fully.filtered.100k.plus.recode.maf05.recode.impute.vcf.gz # 经Beagle软件进行基因型填充后的全过滤变异位点文件（详细信息参见论文） #### 本目录下各脚本功能说明 #### Trim_N_QC.sh # 对原始测序数据进行质量修剪，并运行fastQC工具评估修剪后的数据质量 BWA_PICARD_vanc1.sh # 示例脚本：使用BWA工具将测序序列比对至参考基因组，同时借助Picard工具完成排序、去重复及二进制比对/映射（Binary Alignment/Map, BAM）文件索引构建 P_call_test-2-vanc.sh # 基于freebayes的SNP呼叫流程的第一部分（调用freebayes-parallel-part1_vanc.sh脚本） freebayes-parallel-part1_vanc.sh # 功能同上 Filter_vanc.sh # 生成需从Delly输出结果中过滤的结构变异（Structural Variant, SV）列表 filter_delly.sh # 基于预先生成的结构变异过滤列表完成Delly输出结果的过滤 delly_vanc.sh # 使用Delly软件进行结构变异呼叫 bcf2vcf.sh # 将Delly输出的二进制变体调用格式（Binary Call Format, BCF）文件转换为变体调用格式（Variant Call Format, VCF） freebayes-parallel-part2.sh # freebayes变异呼叫流程的第二部分 merge_vanc_vars.sh # freebayes变异呼叫流程的第二部分（调用freebayes-parallel-part2.sh脚本） site_depth_vanc.sh # 计算每个SNP位点的测序深度 remove_highdepth_vanc.sh # 移除测序深度超出阈值的SNP位点 hardy_vanc.sh # 计算每个SNP位点的哈迪-温伯格平衡（Hardy-Weinberg Equilibrium, HWE）值 remove_hwe_vanc.sh # 基于哈迪-温伯格平衡阈值移除不符合要求的SNP位点 filter_vcf_size.sh # 移除位于长度小于100Kb的序列支架上的SNP位点 filter_vcf_maf05.sh # 基于5%最小等位基因频率（Minor Allele Frequency, MAF）阈值过滤SNP位点 beagle.sh # 使用Beagle软件进行基因型填充 LEA_con.R # 将VCF格式文件转换为LFMM及Geno格式 Snpeff_ANN.sh # 使用SNPeff工具对VCF格式的变异文件进行功能注释 plink_for_sambaR.sh # 将VCF格式文件转换为适配SambaR分析的格式 LD_test.sh # 示例脚本：计算每个序列支架的连锁不平衡（Linkage Disequilibrium, LD）水平 vcf_stats.sh # 从最终过滤后的VCF文件中提取各类统计指标 get_pi_diversity.sh # 计算各群体的核苷酸多样性（π） sambaR.R # 运行SambaR分析流程 lfmm2_analysis.R # 用于对LFMM2输出结果进行分析并生成可视化图表的代码 Max_ent_map.R # 生成最大熵（MaxEnt）物种分布预测图 RDA_script.R # 用于结构变异的冗余分析（Redundancy Analysis, RDA）代码 snprelate_script.R # 运行SNPrelate分析，并绘制目标序列支架上的Fst及核苷酸多样性（π）曲线 repeat_correctedfst.R # 分析重复序列密度与Fst值之间的相关性 LD_script.R # 连锁不平衡相关分析代码

提供机构：

Heraghty, Sam

创建时间：

2022-07-14