Data from: Comparing methods for SNP calling from Genotyping-By-Sequencing (GBS) data for a large-genome conifer without a published genome sequence

Name: Data from: Comparing methods for SNP calling from Genotyping-By-Sequencing (GBS) data for a large-genome conifer without a published genome sequence
Creator: Dryad
Published: 2026-03-09 17:38:10
License: 暂无描述

DataCite Commons2026-03-09 更新2026-04-25 收录

下载链接：

https://datadryad.org/dataset/doi:10.5061/dryad.6fv8fb4

下载链接

链接失效反馈

官方服务：

资源简介：

Reduced-representation restriction-enzyme-based sequencing methods have been demonstrated to be robust and cost-effective genotyping methods to identify Single Nucleotide Polymorphisms (SNPs). While alignment of the short-read fragments to a genome sequence of the same species results in better SNP calling than de novo approaches, only a few tree species - and few conifers in particular - have an annotated sequence. Many conifer genomes are huge (>19 GB) and include a large proportion of repeat sequences, making assembly difficult. While the sequence of a related species could be used, choosing the proper pipeline for SNP calling is still challenging. Here we compare the performance of four bioinformatics pipelines, two of which require a reference genome (TASSEL-GBS V2 and Stacks), two of which are de novo pipelines (UNEAK and Stacks). We used Illumina GBS data from 94 ponderosa pines. Using loblolly pine genome as the reference greatly increased the number of SNPs called (62 -196 thousand vs. 2.1 - 2.7 million SNPs). UNEAK was fastest and identified more SNPs than Stacks de novo. Reference-based Stacks produced the highest number of SNPs with lowest proportion of paralogs, TASSEL-GBS V2 exhibited the highest proportion of paralogs. The Stacks reference-based approach produced the best results overall, while UNEAK is the better de novo method. However, all four pipelines had distinct benefits and limitations. Differences in observed and expected heterozygosity between the SNP sets generated by the pipelines could lead to different conclusions when they are used for population genetics analyses.

提供机构：

Dryad

创建时间：

2020-02-05

5,000+

优质数据集

54 个

任务类型

进入经典数据集