five

Raw BRCA1/2 variants in breast cancer patients and healthy relatives produced with GATK.

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/records/215615
下载链接
链接失效反馈
官方服务:
资源简介:
Aligned sequencing data is available in the NCBI Sequence Read Archive (SRA, https://www.ncbi.nlm.nih.gov/sra/) under accession SRP095082. Variants were called using GATK HaplotypeCaller (version 3.6). After joint performing joint genotyping multi-sample vcf file was generated. Next, SNPs and indels were extracted into two different vcf files and specific set of filters were applied for each case.   File descriptions Datasets BRCA_SNVs.vcf - this file contains SNPs called with GATK and hard filters applied. Following filtering options were applied: "QD < 2.0", "FS > 60.0", "MQ < 40.0",  "MQRankSum < -12.5", "ReadPosRankSum < -8.0", "SB < -0.10" , "DP < 10" , "GQ < 30" , and "SOR > 3.0" BRCA_indels.vcf - This file contains indels called with GATK and hard filters applied. Following filtering options were applied: "QD < 2.0", "FS > 200.0", "ReadPosRankSum < -20.0", "InbreedingCoeff < -0.8", "SOR > 10.0".   Scripts package (scritps.zip) Scripts.zip file contains scripts and supporting files for genotype calling and filtering.  raw.variant.caling.sh – bam files preprocessing, alignment refining and raw genotype calling with HaplotypeCaller. genotyping_and_filtering.sh – joint genotyping, variant hard filtering and callset refinement. LIST.txt – supporting file that contains bam filenames containing aligned reads. sample_order.txt – supporting file for sample renaming.   Reference files (hg19) used in variant calling scripts Reference files can be downloaded from GATK bundle web-site at https://software.broadinstitute.org/gatk/download/bundle.   ucsc.hg19.fasta - human genome assembly; Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz – set of known indels to be used for local realignment; 1000G_phase1.indels.hg19.sites.vcf.gz – set of known indels to be used for local realignment; dbsnp_138.hg19.vcf.gz – a recent dbSNP release (build 138);  1000G_phase3_v4_20130502.hg19.lifted.sites.vcf – the latest set from 1000G phase 3 (v4) for genotype refinement.
创建时间:
2020-01-24
二维码
社区交流群
二维码
科研交流群
商业服务