SNP datasets and genomes used to benchmark the SNPLift program
收藏DataCite Commons2025-06-01 更新2025-04-09 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.h9w0vt4nx
下载链接
链接失效反馈官方服务:
资源简介:
Motivation: The advent of high-throughput sequencing technologies and
availability of reference genomes has provided an unprecedented
opportunity to discover and genotype millions of genetic variants in
hundreds or even thousands of samples. Variant calling, the identification
of genetic variants from raw sequencing data, is a time-consuming and
computationally expensive process. Currently, reference genomes are
evolving very rapidly and new versions come out more and more frequently.
To take advantage of new or improved reference genomes, raw reads
alignments, genotype calling, and filtration must typically all be redone.
This is a costly and time consuming operation that is not always possible
when projects are under time constraints. Results: Here, we present
SNPLift, a bioinformatic pipeline that can quickly transfer SNP
coordinates from one version of a genome to another, making it possible to
rapidly leverage the resources represented by new reference genomes. We
tested SNPLift on nine SNP datasets in VCF format from different species
(Homo sapiens, Arabidopsis thaliana, Coregonus clupeaformis, Medicato
truncatula, Oriza sativa, Salvelinus namaycush, Solanum lycopersicum, Zea
mays, and Glycine max). Depending on the species, we accurately lifted
between 82.64% and 99.39% of the variants very quickly and efficiently,
reducing the required computing power by multiple orders of magnitudes
compared to a complete re-analysis using the new genome reference. SNPLift
provides an accurate, parallelized, efficient and fast solution to update
genome positions, for example for variant calls, based on new reference
genomes. Availability and implementation: SNPLift is available at
https://github.com/enormandeau/snplift with its documentation and
installation procedure. It also contains a script that runs an automated
test on a small dataset, composed of 190,443 SNPs in chromosome 1 of
Medicago truncatula. SNPLift uses only common tools that are easy to
install and works under Linux and MacOS.
提供机构:
Dryad
创建时间:
2023-06-13



