VCF collection for phylogenies and comparative genomics
收藏Figshare2023-07-07 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/VCF_collection_for_phylogenies_and_comparative_genomics/23642028/1
下载链接
链接失效反馈官方服务:
资源简介:
In order to compare our novel isolates with previously described members of various <em>S. cerevisiae</em> clades, including animal husbandry isolates and a Hungarian baker’s yeast, raw sequencing files from literature [Offei, Vandecruys, De Graeve, Foulquié-Moreno, and Thevelein 2019; Peter et al. 2018; Duan, Han, Wang, Liu, Shi, Li, Zhang, and Bai 2018; Imre et al. 2022; Rácz et al. 2021] were downloaded and included in our pipeline. The complete list of genomes used from literature is in Table S1. Using BAM files, local realignment around indels and joint variant calling and filtering for the isolates were performed with GATK 4.1.9.0. [Poplin et al. 2018; Van der Auwera et al. 2013] with regions annotated in the S288c reference as centromeric regions, telomeric regions, or LTRs excluded. First, genomic VCF files were obtained with the Haplotype Caller, and joint genotyping of the gVCF files was applied. Using this initial VCF, we applied base quality score recalibration using GATK and called the BAM files again in gVCF mode. After joint calling, in the resulting VCF files, only SNPs or only INDELS were selected. SNPs were filtered according to the parameters [Fay, Liu, Ong, Dunham, Cromie, Jeffery, Ludlow, and Dudley 2019]: QD < 5.0; QUAL < 30.0; SOR > 3.0; FS > 60.0; MQ < 40.0; MQRankSum < -12.5; ReadPosRankSum < -8.0. INDELS were filtered according to the parameters QD < 5.0; QUAL < 30.0; FS > 60.0; ReadPosRankSum < -20.0. INDELS were then left-aligned. For the final VCF files, INDELS and SNPs were merged, filtered and non-variant sites were removed. This VCF was used for one round of Base Quality Score Recalibration (BQSR), and the whole process of calling was repeated. Cohort calling for the yeast closely related to SILY0002 was also performed with one round of BQSR using the same process, but for the BAM files resulting from mapping to the assembled SILY0002 genome. In the case of allele calling for the SILY0002 mapping, chromosome copy numbers (as determined below) were specified during calling with the HaploType Caller, and regions annotated in the SILY0002 genome by the LRSDAY pipeline as centromeric regions, telomeric regions, or LTRs were excluded. Combined called VCF files were uploaded to FigShare
提供机构:
Pfliegler, Walter
创建时间:
2023-07-07



