five

Genomic variant data and codes used for analysis in the manuscript - Whole genome sequencing reveals the structure of environment associated divergence in a broadly distributed montane bumble bee, Bombus vancouverensis

收藏
Figshare2022-07-14 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/b_vanc_fully_filtered_100k_plus_recode_vcf_gz/20310522/2
下载链接
链接失效反馈
官方服务:
资源简介:
See below for details of the files included below. <br> delly_vanc.vcf.gz # Raw output of Delly <br> b.vanc.fully.filtered.100k.plus.recode.vcf.gz # output of freebayes which was filtered using VCFtools v0.1.13 (Danecek et al. 2011) with the following flags: --remove-indels --min-alleles 2 --max-alleles 2 --minQ 20 --minDP 4 --max-missing 0.75 #Above file was also filtered to remove sites with unusually high coverage (&gt;2x mean coverage) or excess heterozygosity. Finally SNPs that fell on scaffolds less than 100kb in length were removed <br> b.vanc.fully.filtered.100k.plus.recode.maf05.recode.ANN.vcf.gz #Fully filtered variant file (see manuscript for details) with annotation information <br> b.vanc.fully.filtered.100k.plus.recode.maf05.recode.impute.vcf.gz #Fully filtered variant file (see manuscript for details) after imputation with beagle <br> #### Description of each script contained in this directory #### <br> Trim_N_QC.sh #Trim raw sequencing data and run fastQC to evaluate trimmed data <br> BWA_PICARD_vanc1.sh #Example of script used to align sequence data to the reference genome using BWA. Also, uses Picard tools to sort, deduplicate and index bam files <br> P_call_test-2-vanc.sh #First part of pipeline for calling SNPS with freebayes (calls freebayes-parallel-part1_vanc.sh) <br> freebayes-parallel-part1_vanc.sh #see above <br> Filter_vanc.sh #Create list of SV's to filter from DELLY output <br> filter_delly.sh #filter based on generated list of SV's <br> delly_vanc.sh #call SV's using DELLY <br> bcf2vcf.sh # convert bcf from DELLY to vcf format <br> freebayes-parallel-part2.sh #Second part of freebayes pipeline <br> merge_vanc_vars.sh #Second part of freebayes pipeline (calls freebayes-parallel-part2.sh) <br> site_depth_vanc.sh #Gets site depth per SNP <br> remove_highdepth_vanc.sh #removes SNPs above depth threshold <br> hardy_vanc.sh #calculates HWE per SNP <br> remove_hwe_vanc.sh #removes SNPs based on HWE threshold <br> filter_vcf_size.sh #Removes SNPs on scaffolds less than 100Kb in size <br> filter_vcf_maf05.sh #filters SNPs based on 5% MAF filter <br> beagle.sh #imputes using beagle <br> LEA_con.R #converts vcf file into LFMM and geno format <br> Snpeff_ANN.sh # annotate vcf file using SNPeff <br> plink_for_sambaR.sh # convert vcf file into format ready for use in sambaR <br> LD_test.sh #example of script used to calculate LD per scaffold <br> vcf_stats.sh #Gets various stats from final filtered vcf <br> get_pi_diversity.sh #gets per population nucleotide diversity <br> sambaR.R #Runs SambaR <br> lfmm2_analysis.R #Code for running analysis on output of LFMM2 and generating graphs <br> Max_ent_map.R #Generates maxent map <br> RDA_script.R #Code for RDA analysis of structural variants <br> snprelate_script.R #runs SNPrelate as well as makes graphs of Fst and pi along scaffolds of interest <br> repeat_correctedfst.R #Analysis for correlation between repeat density and Fst <br> LD_script.R #analysis of linkage

以下为本数据集包含的所有文件详情: delly_vanc.vcf.gz # Delly 软件的原始输出结果 b.vanc.fully.filtered.100k.plus.recode.vcf.gz # 经VCFtools v0.1.13(Danecek等,2011)过滤后的freebayes变异呼叫结果,所用过滤参数如下:--remove-indels(移除插入缺失变异)、--min-alleles 2(最小等位基因数为2)、--max-alleles 2(最大等位基因数为2)、--minQ 20(最低测序质量值为20)、--minDP 4(最低测序深度为4)、--max-missing 0.75(最大缺失率为0.75)。该文件还额外经过过滤,移除了测序深度异常偏高(大于平均深度2倍)或杂合性过剩的位点;最终移除了位于长度小于100kb的序列支架(scaffold)上的单核苷酸多态性(Single Nucleotide Polymorphism, SNP)位点 b.vanc.fully.filtered.100k.plus.recode.maf05.recode.ANN.vcf.gz # 经过全流程过滤的变异位点文件(详细信息参见论文),包含注释信息 b.vanc.fully.filtered.100k.plus.recode.maf05.recode.impute.vcf.gz # 经Beagle软件进行基因型填充后的全过滤变异位点文件(详细信息参见论文) #### 本目录下各脚本功能说明 #### Trim_N_QC.sh # 对原始测序数据进行质量修剪,并运行fastQC工具评估修剪后的数据质量 BWA_PICARD_vanc1.sh # 示例脚本:使用BWA工具将测序序列比对至参考基因组,同时借助Picard工具完成排序、去重复及二进制比对/映射(Binary Alignment/Map, BAM)文件索引构建 P_call_test-2-vanc.sh # 基于freebayes的SNP呼叫流程的第一部分(调用freebayes-parallel-part1_vanc.sh脚本) freebayes-parallel-part1_vanc.sh # 功能同上 Filter_vanc.sh # 生成需从Delly输出结果中过滤的结构变异(Structural Variant, SV)列表 filter_delly.sh # 基于预先生成的结构变异过滤列表完成Delly输出结果的过滤 delly_vanc.sh # 使用Delly软件进行结构变异呼叫 bcf2vcf.sh # 将Delly输出的二进制变体调用格式(Binary Call Format, BCF)文件转换为变体调用格式(Variant Call Format, VCF) freebayes-parallel-part2.sh # freebayes变异呼叫流程的第二部分 merge_vanc_vars.sh # freebayes变异呼叫流程的第二部分(调用freebayes-parallel-part2.sh脚本) site_depth_vanc.sh # 计算每个SNP位点的测序深度 remove_highdepth_vanc.sh # 移除测序深度超出阈值的SNP位点 hardy_vanc.sh # 计算每个SNP位点的哈迪-温伯格平衡(Hardy-Weinberg Equilibrium, HWE)值 remove_hwe_vanc.sh # 基于哈迪-温伯格平衡阈值移除不符合要求的SNP位点 filter_vcf_size.sh # 移除位于长度小于100Kb的序列支架上的SNP位点 filter_vcf_maf05.sh # 基于5%最小等位基因频率(Minor Allele Frequency, MAF)阈值过滤SNP位点 beagle.sh # 使用Beagle软件进行基因型填充 LEA_con.R # 将VCF格式文件转换为LFMM及Geno格式 Snpeff_ANN.sh # 使用SNPeff工具对VCF格式的变异文件进行功能注释 plink_for_sambaR.sh # 将VCF格式文件转换为适配SambaR分析的格式 LD_test.sh # 示例脚本:计算每个序列支架的连锁不平衡(Linkage Disequilibrium, LD)水平 vcf_stats.sh # 从最终过滤后的VCF文件中提取各类统计指标 get_pi_diversity.sh # 计算各群体的核苷酸多样性(π) sambaR.R # 运行SambaR分析流程 lfmm2_analysis.R # 用于对LFMM2输出结果进行分析并生成可视化图表的代码 Max_ent_map.R # 生成最大熵(MaxEnt)物种分布预测图 RDA_script.R # 用于结构变异的冗余分析(Redundancy Analysis, RDA)代码 snprelate_script.R # 运行SNPrelate分析,并绘制目标序列支架上的Fst及核苷酸多样性(π)曲线 repeat_correctedfst.R # 分析重复序列密度与Fst值之间的相关性 LD_script.R # 连锁不平衡相关分析代码
提供机构:
Heraghty, Sam
创建时间:
2022-07-14
二维码
社区交流群
二维码
科研交流群
商业服务