five

When is an SNP not an SNP?

收藏
Taylor & Francis Group2024-09-12 更新2026-04-16 收录
下载链接:
https://tandf.figshare.com/articles/dataset/When_is_an_SNP_not_an_SNP_/27003526/1
下载链接
链接失效反馈
官方服务:
资源简介:
Genomic duplications are important sources of structural change and gene innovation. In humans, the most recent and highly identical sequences (&gt;90% homology, &gt;1 kb long) are known as segmental duplications (SDs). Single-nucleotide variants or single-nucleotide polymorphisms within SDs have not been systematically assessed due to limitations around mapping short-read sequencing data. Single-nucleotide variant rs62486260 was flagged in a study of familial renal stone disease but it was unclear whether it was real or an artifact resulting from the presence of a SD. We describe <i>in silico</i> and wet-lab approaches to investigate this, using segment-specific long-PCR assays, followed by short PCR for Sanger sequencing. Our conclusion was that rs62486260 is an artifact. Our approach can be generalized to deal with other such situations. The method described includes a two-step procedure for determining whether an apparent single-nucleotide polymorphism may be an artifact resulting from the presence of a duplicated genomic region/pseudogene. Step one involves identifying sequence differences between the two duplicated regions and designing a long PCR assay to specifically amplify each region separately. Step 2 involves amplifying a short PCR product which flanks the single-nucleotide polymorphism of interest, from the long products generated in step 1. Genomic duplications have long been recognized as important sources of structural alterations and gene innovation. Single-nucleotide variants (SNVs) or single-nucleotide polymorphisms (SNPs) within segmental duplications (SDs) have not been systematically assessed due to limitations around mapping short-read sequencing data. SNV rs62486260 was flagged in a study of familial renal stone disease, but it was unclear whether it was real or an artifact resulting from the presence of an SD. We describe <i>in silico</i> and wet-lab approaches to investigate, using segment-specific long-PCR assays, followed by short PCR for Sanger sequencing. The method described includes a two-step procedure (<i>in silico</i> and wet-lab analysis) for determining whether an apparent SNP may be an artifact resulting from the presence of a duplicated genomic region/pseudogene. Step one (<i>in silico</i> analysis) involves identifying sequence differences between the two duplicated regions and designing a long PCR assay to specifically amplify each region separately. Step two (wet-lab analysis) involves amplifying a short PCR product which flanks the SNP of interest from the long products generated in step one and subsequent Sanger sequencing. Problems with rs62486260 are noted in both gnomAD and UCSC databases. The discrepancy between AF reported for this variant by UKBB (∼0.45) and gnomAD (∼0.02) hinted at an issue with the variant. The hg19 <i>in silico</i> PCR using short product primers predicted that three separate products are amplified, all of identical size. Using the same approaches as hg19 above, hg38 <i>in silico</i> PCR also revealed three matches to chr7. However, there were only two matches to a ‘fixed’ chromosome chr7_KZ208912v1_fix. Analysis using telomere-to-telomere (T2T) CHM13v2.0/hs1 reveals 2 hits using the primers. Interestingly, rs62486260 is not reported in two more recent versions of freezes T2T CHM13v2.0/hs1 and chr7_KZ208912v1_fix. Long-read sequencing data analysis showed the ‘SNV’ of interest, when present, lies in the region overlying a pseudogene. Genotyping of the samples confirmed that the rs62486260 is an artifact due to the presence of a pseudogene. Based on the latest human genome freeze [Jan. 2022 (T2T CHM13v2.0/hs1)], the centromeric region contains the gene <i>TCAF2</i>, whereas the telomeric region contains the pseudogene. Pseudogenes located in SDs are a hidden peril when determining the likely clinical significance of SNPs reported from genomic sequencing. The observed ‘SNP’ actually lies within a pseudogene and is therefore much less likely to be causally associated with the phenotype of interest.
提供机构:
Leggatt, Gary P; Jalilzadeh, Shapour; Walker, Valerie; Hatchwell, Eli
创建时间:
2024-09-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作