Data from: SNP discovery in non-model organisms: strand-bias and base-substitution errors reduce conversion rates
收藏DataCite Commons2025-06-01 更新2025-06-15 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.n3bb2
下载链接
链接失效反馈官方服务:
资源简介:
Single nucleotide polymorphisms (SNPs) have become the marker of choice
for genetic studies in organisms of conservation, commercial or biological
interest. Most SNP discovery projects in nonmodel organisms apply a
strategy for identifying putative SNPs based on filtering rules that
account for random sequencing errors. Here, we analyse data used to
develop 4723 novel SNPs for the commercially important deep-sea fish,
orange roughy (Hoplostethus atlanticus), to assess the impact of not
accounting for systematic sequencing errors when filtering identified
polymorphisms when discovering SNPs. We used SAMtools to identify
polymorphisms in a velvet assembly of genomic DNA sequence data from seven
individuals. The resulting set of polymorphisms were filtered to minimize
‘bycatch’—polymorphisms caused by sequencing or assembly error. An
Illumina Infinium SNP chip was used to genotype a final set of 7714
polymorphisms across 1734 individuals. Five predictors were examined for
their effect on the probability of obtaining an assayable SNP: depth of
coverage, number of reads that support a variant, polymorphism type (e.g.
A/C), strand-bias and Illumina SNP probe design score. Our results
indicate that filtering out systematic sequencing errors could
substantially improve the efficiency of SNP discovery. We show that BLASTX
can be used as an efficient tool to identify single-copy genomic regions
in the absence of a reference genome. The results have implications for
research aiming to identify assayable SNPs and build SNP genotyping assays
for nonmodel organisms.
提供机构:
Dryad
创建时间:
2015-05-22



