Filtered and annotated SNV and indel variants in the PC3 and LNCaP human prostate cancer cell lines
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/records/245431
下载链接
链接失效反馈官方服务:
资源简介:
150bp paired-end reads (insert size 350bp) were obtained using the Illumina HiSeqX sequencer. Samtools v1.3.1 mpileup and bcftools were used to interrogate indexed BAM files, from whole-genome reads aligned to human reference genome GRCh38 build 82, and generate a VCF (Variant Call Format) file of single nucleotide variants (SNVs) and short indel variants. Variants private, or unique to a particular cell line, or shared by both were next identified. Variants (likely to be common germline variants) present in HapMap, 1000 genomes phase 3 (2,504 human genomes), and the National Heart Lung and Blood Institute's Exome Sequencing Project (ESP) (bundled variant data file available at https://goo.gl/mEogvD) were excluded. Variant files (VCF) were filtered using SnpSift with the following parameters: 'QUAL \textgreater= 200 \&\& DP \textgreater= 30', where QUAL denotes minimum variance confidence and DP total depth threshold. Filtered variants were annotated using SnpEff v4.3g. Please see https://github.com/sciseim/PCaWGS for associated scripts.
使用Illumina HiSeqX测序仪获取了插入片段长度为350bp的150bp双端测序读段(paired-end reads)。使用Samtools v1.3.1的mpileup模块与bcftools工具,对比对至人类参考基因组GRCh38版本82(GRCh38 build 82)的全基因组测序读段所生成的索引化BAM格式文件(Binary Alignment Map)进行分析,并生成包含单核苷酸变异(single nucleotide variants, SNVs)与短插入缺失变异(short indel variants)的VCF格式文件(Variant Call Format,变异调用格式)。随后鉴定出两类变异:仅存在于某一特定细胞系的私有变异,以及两个细胞系共有的变异。将存在于HapMap、1000基因组计划第三阶段(2504个人类基因组)以及美国国家心肺血液研究所外显子组测序项目(Exome Sequencing Project, ESP)(打包变异数据文件可从https://goo.gl/mEogvD获取)中的变异(大概率为常见生殖系变异(germline variants))予以排除。使用SnpSift工具按照以下参数对VCF格式的变异文件进行过滤:'QUAL ≥ 200 && DP ≥ 30',其中QUAL代表最小变异置信度,DP代表总测序深度阈值。使用SnpEff v4.3g对过滤后的变异进行注释。相关脚本可访问https://github.com/sciseim/PCaWGS获取。
创建时间:
2020-01-24



