five

Instructions on how to run PanGenie on HGSVC3 + HPRC data

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/12772894
下载链接
链接失效反馈
官方服务:
资源简介:
Input Data Reference genome The reference genome used (CHM13) can be obtained from: https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0.fa.gz Reads We used the raw 1kg Illumina reads (FASTA/FASTQ) obtained from: http://ftp.sra.ebi.ac.uk/vol1/fastq/.We created a single FASTQ file per sample by concatenating individual files, however, PanGenie can also be run with multiple files per sample (see instructions below on how this is done). Input panel VCF The input VCF was derived from Minigraph-Cactus VCFs produced from 65 HGSVC3 and 42 HPRC assemblies. It is available here in this repository:  MC_hgsvc3-hprc_chm13_filtered_bubbles.vcf.gz There is an additional VCF containing decomposed variant alleles, which will be needed to postprocess the PanGenie output (in order to translate bubble genotypes to genotypes for all nested variants). This VCF is also available in this repository and is called: MC_hgsvc3-hprc_chm13_filtered_decomposed.vcf.gz Running PanGenie We used PanGenie version v3.1.0 and ran it via singularity as described in the README (https://github.com/eblerjana/pangenie/tree/master). The singularity image used is available in this repository: eblerjana_eblerjana_pangenie-v3.1.0.sif Commands used for genotyping (1) Indexing (run only once):   PanGenie-index -v MC_hgsvc3-hprc_chm13_filtered_bubbles.vcf -r chm13v2.0.fa -o index -t 24 Note: the input panel VCF (-v) as well as the reference genome (-r) need to be uncompressed. (2) Genotyping (run for each sample):  PanGenie -a 108 -f index -i <(zcat .R1.fq.gz .R2.fq.gz)  -o pangenie_ -j 24 -t 24 -s -j and -t specify the number of threads to be used for internal k-mer counting and genotyping steps (here 24 cores are used for both). The -i parameter is used to provide input reads (in FASTA/FASTQ format). If you have more than one file per sample (e.g. for paired-end reads) use the syntax shown in the command above to avoid having to concatenate the files beforehand. (3) Decompose bubbles (run for each sample) cat pangenie__genotyping.vcf | python3 convert-to-biallelic.py MC_hgsvc3-hprc_chm13_filtered_decomposed.vcf > pangenie__genotyping_biallelic.vcf Note: the convert-to-biallelic.py script is provided in this repository.
创建时间:
2024-07-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作