five

Pangenome-based Genome Inference

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/4312742
下载链接
链接失效反馈
官方服务:
资源简介:
Haplotype-resolved assemblies ("haplotype-resolved-assemblies.tar.gz"), the variant callset and pangenome graph ("callset-and.graph.tar.gz") produced from these assemblies, callsets and graphs used for the "leave-one-out" evaluation ("leave-one-out.tar.gz"), and PanGenie genotypes ("cohort-genotypes.tar.gz") for 300 samples (consisting of 100 trios) selected from the 1000 Genome samples. Abstract: Typical analysis workflows map reads to a reference genome in order to genotype genetic variants. Generating such alignments introduces reference biases and comes with substantial computational burden. In contrast, recent k-mer based genotypers are fast, but struggle in repetitive or duplicated genomic regions. We propose a novel algorithm, PanGenie, that leverages a haplotype-resolved pangenome reference in conjunction with k-mer counts from short-read sequencing data to genotype a wide spectrum of genetic variation - a process we refer to as genome inference. Compared to mapping-based approaches, PanGenie is more than 4x faster at 30x coverage and reaches significantly better genotype concordances for almost all variant types and coverages tested. Improvements are especially pronounced for large insertions (>=50bp) and variants in repetitive regions, enabling the inclusion of these classes of variants in genome-wide association studies. PanGenie efficiently leverages the increasing amount of haplotype-resolved assemblies to unravel the functional impact of previously inaccessible variants while being scalable to thousands of genotyped samples.
创建时间:
2021-10-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作