five

SNP datasets and genomes used to benchmark the SNPLift program

收藏
DataCite Commons2025-06-01 更新2025-04-09 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.h9w0vt4nx
下载链接
链接失效反馈
官方服务:
资源简介:
Motivation: The advent of high-throughput sequencing technologies and availability of reference genomes has provided an unprecedented opportunity to discover and genotype millions of genetic variants in hundreds or even thousands of samples. Variant calling, the identification of genetic variants from raw sequencing data, is a time-consuming and computationally expensive process. Currently, reference genomes are evolving very rapidly and new versions come out more and more frequently. To take advantage of new or improved reference genomes, raw reads alignments, genotype calling, and filtration must typically all be redone. This is a costly and time consuming operation that is not always possible when projects are under time constraints. Results: Here, we present SNPLift, a bioinformatic pipeline that can quickly transfer SNP coordinates from one version of a genome to another, making it possible to rapidly leverage the resources represented by new reference genomes. We tested SNPLift on nine SNP datasets in VCF format from different species (Homo sapiens, Arabidopsis thaliana, Coregonus clupeaformis, Medicato truncatula, Oriza sativa, Salvelinus namaycush, Solanum lycopersicum, Zea mays, and Glycine max). Depending on the species, we accurately lifted between 82.64% and 99.39% of the variants very quickly and efficiently, reducing the required computing power by multiple orders of magnitudes compared to a complete re-analysis using the new genome reference. SNPLift provides an accurate, parallelized, efficient and fast solution to update genome positions, for example for variant calls, based on new reference genomes. Availability and implementation: SNPLift is available at https://github.com/enormandeau/snplift with its documentation and installation procedure. It also contains a script that runs an automated test on a small dataset, composed of 190,443 SNPs in chromosome 1 of Medicago truncatula. SNPLift uses only common tools that are easy to install and works under Linux and MacOS.
提供机构:
Dryad
创建时间:
2023-06-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作