Data from: Dealing with paralogy in RADseq data: in silico detection and single nucleotide polymorphism validation in Robinia pseudoacacia L.
收藏DataCite Commons2025-06-01 更新2025-06-15 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.qn4br
下载链接
链接失效反馈官方服务:
资源简介:
The RADseq technology allows researchers to efficiently develop thousands
of polymorphic loci across multiple individuals with little or no prior
information on the genome. However, many questions remain about the biases
inherent to this technology. Notably, sequence misalignments arising from
paralogy may affect the development of single nucleotide polymorphism
(SNP) markers and the estimation of genetic diversity. We evaluated the
impact of putative paralog loci on genetic diversity estimation during the
development of SNPs from a RADseq dataset for the nonmodel tree species
Robinia pseudoacacia L. We sequenced nine genotypes and analyzed the
frequency of putative paralogous RAD loci as a function of both the depth
of coverage and the mismatch threshold allowed between loci. Putative
paralogy was detected in a very variable number of loci, from 1% to more
than 20%, with the depth of coverage having a major influence on the
result. Putative paralogy artificially increased the observed degree of
polymorphism and resulting estimates of diversity. The choice of the depth
of coverage also affected diversity estimation and SNP validation: A low
threshold decreased the chances of detecting minor alleles while a high
threshold increased allelic dropout. SNP validation was better for the low
threshold (4×) than for the high threshold (18×) we tested. Using the
strategy developed here, we were able to validate more than 80% of the
SNPs tested by means of individual genotyping, resulting in a readily
usable set of 330 SNPs, suitable for use in population genetics
applications.
提供机构:
Dryad
创建时间:
2017-03-16



