Data from: A draft fur seal genome provides insights into factors affecting SNP validation and how to mitigate them
收藏DataONE2016-05-19 更新2024-06-26 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈官方服务:
资源简介:
Custom genotyping arrays provide a flexible and accurate means of genotyping single nucleotide polymorphisms (SNPs) in a large number of individuals of essentially any organism. However, validation rates, defined as the proportion of putative SNPs that are verified to be polymorphic in a population, are often very low. A number of potential causes of assay failure have been identified, but none have been explored systematically. In particular, as SNPs are often developed from transcriptomes, parameters relating to the genomic context are rarely taken into account. Here, we assembled a draft Antarctic fur seal (Arctocephalus gazella) genome (assembly size: 2.41Gb; scaffold/contig N50: 3.1Mb/27.5kb). We then used this resource to map the probe sequences of 144 putative SNPs genotyped in 480 individuals. The number of probe-to-genome mappings and alignment length together explained almost a third of the variation in validation success, indicating that sequence uniqueness and proximity to intron-exon boundaries play an important role. The same pattern was found after mapping the probe sequences to the Walrus and Weddell seal genomes, suggesting that the genomes of species divergent by as much as 23 million years can hold information relevant to SNP validation outcomes. Additionally, re-analysis of genotyping data from seven previous studies found the same two variables to be significantly associated with SNP validation success across a variety of taxa. Finally, our study reveals considerable scope for validation rates to be improved, either by simply filtering for SNPs whose flanking sequences align uniquely and completely to a reference genome, or through predictive modeling.
定制基因分型芯片可为几乎所有物种的大量个体提供灵活且精准的单核苷酸多态性(single nucleotide polymorphisms, SNPs)分型技术手段。然而,验证率(即经群体多态性验证的推定单核苷酸多态性位点占比)通常处于极低水平。目前已明确多种基因分型检测失败的潜在诱因,但尚未有研究对其开展系统性探究。值得注意的是,由于单核苷酸多态性位点通常从转录组(transcriptomes)中开发获得,与基因组背景相关的参数极少被纳入考量范畴。本研究首先组装了南极海狗(Arctocephalus gazella)的草图基因组,其组装大小为2.41吉碱基对(Gb),支架(scaffold)/重叠群(contig)N50分别为3.1 Mb/27.5 kb。随后,本研究利用该基因组资源,对480个个体中完成分型的144个推定单核苷酸多态性位点的探针序列进行基因组比对。探针基因组比对数量与比对长度二者共同解释了近三分之一的验证成功率变异,表明序列唯一性及其与内含子-外显子边界的邻近性对验证结果具有关键影响。将探针序列比对至海象和威德尔海豹基因组后,同样观察到该规律,这表明分歧时间长达2300万年的物种基因组,仍可提供与单核苷酸多态性位点验证结果相关的有效信息。此外,对此前7项研究的基因分型数据进行重新分析后发现,在多个分类群中,上述两个变量均与单核苷酸多态性位点的验证成功率呈显著相关。最后,本研究揭示了提升验证率的巨大潜力:既可以通过简单筛选侧翼序列能够唯一且完整比对至参考基因组的单核苷酸多态性位点实现,也可借助预测建模达成目标。
创建时间:
2016-05-19



