Instructions on how to run PanGenie on HGSVC3 + HPRC data
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/12772894
下载链接
链接失效反馈官方服务:
资源简介:
Input Data
Reference genome
The reference genome used (CHM13) can be obtained from: https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0.fa.gz
Reads
We used the raw 1kg Illumina reads (FASTA/FASTQ) obtained from: http://ftp.sra.ebi.ac.uk/vol1/fastq/.We created a single FASTQ file per sample by concatenating individual files, however, PanGenie can also be run with multiple files per sample (see instructions below on how this is done).
Input panel VCF
The input VCF was derived from Minigraph-Cactus VCFs produced from 65 HGSVC3 and 42 HPRC assemblies. It is available here in this repository: MC_hgsvc3-hprc_chm13_filtered_bubbles.vcf.gz There is an additional VCF containing decomposed variant alleles, which will be needed to postprocess the PanGenie output (in order to translate bubble genotypes to genotypes for all nested variants). This VCF is also available in this repository and is called: MC_hgsvc3-hprc_chm13_filtered_decomposed.vcf.gz
Running PanGenie
We used PanGenie version v3.1.0 and ran it via singularity as described in the README (https://github.com/eblerjana/pangenie/tree/master). The singularity image used is available in this repository: eblerjana_eblerjana_pangenie-v3.1.0.sif
Commands used for genotyping
(1) Indexing (run only once):
PanGenie-index -v MC_hgsvc3-hprc_chm13_filtered_bubbles.vcf -r chm13v2.0.fa -o index -t 24
Note: the input panel VCF (-v) as well as the reference genome (-r) need to be uncompressed.
(2) Genotyping (run for each sample):
PanGenie -a 108 -f index -i <(zcat .R1.fq.gz .R2.fq.gz) -o pangenie_ -j 24 -t 24 -s
-j and -t specify the number of threads to be used for internal k-mer counting and genotyping steps (here 24 cores are used for both).
The -i parameter is used to provide input reads (in FASTA/FASTQ format). If you have more than one file per sample (e.g. for paired-end reads) use the syntax shown in the command above to avoid having to concatenate the files beforehand.
(3) Decompose bubbles (run for each sample)
cat pangenie__genotyping.vcf | python3 convert-to-biallelic.py MC_hgsvc3-hprc_chm13_filtered_decomposed.vcf > pangenie__genotyping_biallelic.vcf
Note: the convert-to-biallelic.py script is provided in this repository.
创建时间:
2024-07-18



