five

vcf file of all heliothine individuals that have undergone whole genome sequencing aligned to B3 and B1/B2 BAC

收藏
DataONE2016-09-28 更新2024-06-26 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈
官方服务:
资源简介:
Chromosome "1" contains the B3 BAC, Chromsome "B1_B2" contains the B1/B2 BAC. Heliothine moths were collected between 2004 and 2014 from 16 different countries around the world across various climatic zones and altitudes (Tables S1 and S2), many of which are described in Behere et al. (2007); and Tay et al. (2013). Samples were collected as larvae from wild and crop host plants, as adult moths via light/pheromone traps, or as larvae after bioassay, and preserved in ethanol (>95%) or RNAlater, or stored at -20°C prior to DNA extraction. DNA was extracted from samples using DNeasy blood and tissue kits (Qiagen), before being quantified with a Qubit 2.0. Nextera libraries were produced following the manufacturer’s instructions and sequence was generated as 100 bp PE reads (Illumina HiSeq 2000, Biological Resources Facility, Australian National University, Canberra, Australia, as well as at Beijing Genomics Institute, Hong Kong). Sample and sequencing data are included in the supplementary material (Table S2). Raw reads were aligned to BAC sequences, originally derived from H. armigera and available on NCBI (accessions in supplementary document), using BBMap. Reads were trimmed when quality in at least 2 bases fell below Q10. Only uniquely aligning reads were included in the analysis, to prevent spuriously inferring evolutionary processes occurring independently on each BAC. Outputted BAM files were sorted before duplicate reads were removed and files were annotated with read groups using Picard v. 1.138 (http://picard.sourceforge.net). BAC reference sequences were indexed using Samtools v. 1.1.0 (Li et al. 2009). UnifiedGenotyper in GATK v. 3.3-0 (McKenna et al. 2010) was used to estimate genotypes across all individuals simultaneously, implementing a heterozygosity value of 0.01. Variant call format files containing SNP calls were reformatted into Plink format using VCFtools v. 0.1.12b (Danecek et al. 2011). When linkage disequilibrium (LD)-based pruning was necessary, Plink v. 1.07 (Purcell et al. 2007) was used to filter one of a pair of SNPs using a pairwise LD threshold (r2=0.5) within windows of 50 SNPs, moving forwards 5 SNPs per iteration.

染色体1包含B3细菌人工染色体(Bacterial Artificial Chromosome, BAC),染色体B1_B2包含B1/B2细菌人工染色体。 2004年至2014年间,研究人员从全球16个不同国家的多样气候带与海拔区域采集了实夜蛾亚科蛾类(Heliothine moths)样本,相关采样详情见于补充表S1、S2,其中多数采样背景已在Behere等(2007)及Tay等(2013)的研究中详述。 样本采集方式包括:从野生及作物寄主植物上采集幼虫,通过灯光/性信息素诱捕器捕获成虫,或经生物测定后采集幼虫;样本保存于体积分数≥95%的乙醇或RNAlater试剂中,或在DNA提取前置于-20℃低温保存。 使用DNeasy血液与组织试剂盒(Qiagen)从样本中提取基因组DNA,随后使用Qubit 2.0荧光定量仪对DNA浓度进行定量。Nextera文库构建严格遵循试剂盒说明书完成,测序生成100 bp双端读段(Paired-End Reads, PE reads),测序平台为Illumina HiSeq 2000,测序工作分别由澳大利亚国立大学堪培拉生物资源中心及香港华大基因(Beijing Genomics Institute)完成。 样本信息与测序数据详见补充材料(表S2)。使用BBMap工具将原始测序读段比对至源自棉铃虫(Helicoverpa armigera)、可于NCBI数据库获取的细菌人工染色体参考序列(补充文档中附其登录号)。当至少2个连续碱基的质量评分低于Q10时,将对读段进行修剪;仅保留唯一比对的读段用于后续分析,以避免错误推断仅在单条细菌人工染色体上独立出现的进化过程。 生成的二进制比对映射(Binary Alignment Map, BAM)文件先完成排序,随后使用Picard v.1.138(http://picard.sourceforge.net)去除重复读段并为文件添加读段组注释。使用Samtools v.1.1.0(Li et al. 2009)对细菌人工染色体参考序列构建索引。 采用基因组分析工具包(Genome Analysis Toolkit, GATK)v.3.3-0中的UnifiedGenotyper工具(McKenna et al. 2010),同时对所有个体同步进行基因型估计,设定杂合度阈值为0.01。包含单核苷酸多态性(Single Nucleotide Polymorphism, SNP)调用结果的变异调用格式(Variant Call Format, VCF)文件,通过VCFtools v.0.1.12b(Danecek et al. 2011)转换为Plink格式文件。若需基于连锁不平衡(Linkage Disequilibrium, LD)进行位点修剪,则使用Plink v.1.07(Purcell et al. 2007),以50个单核苷酸多态性为滑动窗口、每次步进5个单核苷酸多态性,设定成对连锁不平衡阈值r²=0.5,对每对单核苷酸多态性中的一个位点进行过滤。
创建时间:
2016-09-28
二维码
社区交流群
二维码
科研交流群
商业服务