SNP datasets and genomes used to benchmark the SNPLift program

DataONE2023-06-13 更新2024-06-08 收录

下载链接：

https://search.dataone.org/view/sha256:6f03c69fe78c1e1d089f152b927f7d68ffc5e6db503498847cb2dfc7ed1e41e2

下载链接

链接失效反馈

官方服务：

资源简介：

Motivation: The advent of high-throughput sequencing technologies and availability of reference genomes has provided an unprecedented opportunity to discover and genotype millions of genetic variants in hundreds or even thousands of samples. Variant calling, the identification of genetic variants from raw sequencing data, is a time-consuming and computationally expensive process. Currently, reference genomes are evolving very rapidly and new versions come out more and more frequently. To take advantage of new or improved reference genomes, raw reads alignments, genotype calling, and filtration must typically all be redone. This is a costly and time consuming operation that is not always possible when projects are under time constraints. Results: Here, we present SNPLift, a bioinformatic pipeline that can quickly transfer SNP coordinates from one version of a genome to another, making it possible to rapidly leverage the resources represented by new reference genomes. We tested SNPLift on..., Nine species are present in the dataset. For each species, two genome versions and one VCF are present. The VCF contains SNPs whose positions refer to the oldest reference genome., The authors of SNPLift used the data of 9 species to perform benchmarks, as described in the project's GitHub repository (https://github.com/enormandeau/snplift). This repository in turn points to the publication where the benchmark results are presented.

创建时间：

2025-07-20

5,000+

优质数据集

54 个

任务类型

进入经典数据集