Pangenome-based Genome Inference

NIAID Data Ecosystem2026-03-13 收录

下载链接：

https://zenodo.org/record/4312742

下载链接

链接失效反馈

官方服务：

资源简介：

Haplotype-resolved assemblies ("haplotype-resolved-assemblies.tar.gz"), the variant callset and pangenome graph ("callset-and.graph.tar.gz") produced from these assemblies, callsets and graphs used for the "leave-one-out" evaluation ("leave-one-out.tar.gz"), and PanGenie genotypes ("cohort-genotypes.tar.gz") for 300 samples (consisting of 100 trios) selected from the 1000 Genome samples. Abstract: Typical analysis workflows map reads to a reference genome in order to genotype genetic variants. Generating such alignments introduces reference biases and comes with substantial computational burden. In contrast, recent k-mer based genotypers are fast, but struggle in repetitive or duplicated genomic regions. We propose a novel algorithm, PanGenie, that leverages a haplotype-resolved pangenome reference in conjunction with k-mer counts from short-read sequencing data to genotype a wide spectrum of genetic variation - a process we refer to as genome inference. Compared to mapping-based approaches, PanGenie is more than 4x faster at 30x coverage and reaches significantly better genotype concordances for almost all variant types and coverages tested. Improvements are especially pronounced for large insertions (>=50bp) and variants in repetitive regions, enabling the inclusion of these classes of variants in genome-wide association studies. PanGenie efficiently leverages the increasing amount of haplotype-resolved assemblies to unravel the functional impact of previously inaccessible variants while being scalable to thousands of genotyped samples.

创建时间：

2021-10-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集