Data from: Single-nucleotide polymorphism discovery and validation in high-density SNP array for genetic analysis in European white oaks
收藏DataONE2015-03-24 更新2024-06-27 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈官方服务:
资源简介:
An Illumina Infinium SNP genotyping array was constructed for European white oaks. Six individuals of Quercus petraea and Q. robur were considered for SNP discovery using both previously obtained Sanger sequences across 676 gene regions (1371 in vitro SNPs) and Roche 454 technology sequences from 5112 contigs (6542 putative in silico SNPs). The 7913 SNPs were genotyped across the six parental individuals, full-sib progenies (one within each species and two interspecific crosses between Q. petraea and Q. robur) and three natural populations from south-western France that included two additional interfertile white oak species (Q. pubescens and Q. pyrenaica). The genotyping success rate in mapping populations was 80.4% overall and 72.4% for polymorphic SNPs. In natural populations, these figures were lower (54.8% and 51.9%, respectively). Illumina genotype clusters with compression (shift of clusters on the normalized x-axis) were detected in ~25% of the successfully genotyped SNPs and may be due to the presence of paralogues. Compressed clusters were significantly more frequent for SNPs showing a priori incorrect Illumina genotypes, suggesting that they should be considered with caution or discarded. Altogether, these results show a high experimental error rate for the Infinium array (between 15% and 20% of SNPs potentially unreliable and 10% when excluding all compressed clusters), and recommendations are proposed when applying this type of high-throughput technique. Finally, results on diversity levels and shared polymorphisms across targeted white oaks and more distant species of the Quercus genus are discussed, and perspectives for future comparative studies are proposed.
本研究针对欧洲白栎类群构建了Illumina Infinium单核苷酸多态性(SNP)基因分型芯片。本研究选取6株无梗花栎(Quercus petraea)与夏栎(Q. robur)个体作为SNP发掘材料,结合此前获得的覆盖676个基因区域的Sanger测序序列(共检出1371个体外验证SNPs),以及来自5112个重叠群(contigs)的罗氏454测序数据(推定得到6542个计算机模拟SNPs)开展SNP发掘工作。本研究对7913个SNPs开展基因分型,分型样本包括6株亲本个体、全同胞后代(每个物种各1个株系,以及2株无梗花栎与夏栎的种间杂交个体),以及来自法国西南部的3个天然种群;该3个种群中还包含另外2个可杂交的白栎类群——毛白栎(Q. pubescens)和比利牛斯栎(Q. pyrenaica)。作图群体的基因分型总成功率为80.4%,多态性SNPs的分型成功率则为72.4%;而天然种群中的上述两项指标均更低,分别为54.8%与51.9%。在成功分型的SNPs中,约25%被检测到存在压缩型Illumina基因型聚类(即归一化x轴上的聚类发生偏移),该现象可能由旁系同源序列的存在所导致。带有压缩聚类的SNPs在先验判定为基因型错误的位点中出现频率显著更高,这提示此类SNPs需谨慎处理或直接舍弃。综合来看,本研究结果表明该Infinium芯片存在较高的实验误差率:约15%~20%的SNPs存在潜在不可靠性,若剔除所有压缩聚类的位点,则该比例降至10%;同时针对该类高通量分型技术的应用提出了相关建议。最后,本研究还针对目标白栎类群及栎属(Quercus)内亲缘关系更远的物种的多样性水平与共享多态性展开了讨论,并为未来的比较研究提出了研究展望。
创建时间:
2015-03-24



