Attack of the PCR clones: rates of clonality have little effect on RAD-seq genotype calls
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.3mq4631
下载链接
链接失效反馈官方服务:
资源简介:
Interpretation of high-throughput sequence data requires an understanding of how decisions made during bioinformatic data processing can influence results. One source of bias that is often cited is PCR clones (or PCR duplicates). PCR clones are common in restriction site associated sequencing (RAD-seq) datasets, which are increasingly being used for molecular ecology. To determine the influence PCR clones and the bioinformatic handling of clones have on genotyping, we evaluate four RAD-seq datasets. Datasets were compared before and after clones were removed to estimate the number of clones present in RAD-seq data, quantify how often the presence of clones in a dataset cause genotype calls to change compared to when clones were removed, investigate the mechanisms that lead to genotype call changes, and test if clones bias heterozygosity estimates. Our RAD-seq datasets contained 30 – 60% PCR clones, but 95% of RAD-tags had five or fewer clones. Relatively few genotypes changed once clones were removed (5-10%), and the vast majority of these changes (98%) were associated with genotypes switching from a called to no-call state or vice versa. PCR clones had a larger influence on genotype calls in individuals with low read depth but appeared to influence genotype calls at all loci similarly. Removal of PCR clones reduced the number of called genotypes by 2% but had almost no influence on estimates of heterozygosity. As such, while steps should be taken to limit PCR clones during library preparation, PCR clones are likely not a substantial source of bias for most RAD-seq studies.
高通量测序数据的解析,有赖于研究者理解生物信息学数据处理流程中的各项决策如何对最终结果产生影响。学界常提及的一类偏倚来源为PCR克隆(PCR clones,或PCR duplicates)。这类克隆在限制性酶切位点关联测序(restriction site associated sequencing, RAD-seq)数据集内十分常见,而RAD-seq目前正日益广泛地应用于分子生态学研究领域。为明确PCR克隆及其生物信息学处理流程对基因分型的影响,本研究对四组RAD-seq数据集展开评估。本研究通过比对克隆去除前后的数据集,完成四项分析:一是估算RAD-seq数据中存在的克隆数量;二是量化相较于克隆去除后的状态,数据中存在克隆时导致基因分型结果发生改变的频率;三是探究引发基因分型结果变更的具体机制;四是验证克隆是否会对杂合度估算结果造成偏倚。本研究使用的四组RAD-seq数据集中共包含30%~60%的PCR克隆,但其中95%的RAD标签(RAD-tags)所对应的克隆数不超过5个。在移除克隆后,仅有5%~10%的基因分型结果发生变更,且其中绝大多数(98%)的变更均表现为分型结果从已判定转为未判定,或反之。PCR克隆对低测序深度个体的基因分型结果影响更为显著,但在所有基因座上对分型结果的影响程度整体相近。移除PCR克隆后,已判定的基因分型总数减少了2%,但对杂合度的估算结果几乎无影响。综上,尽管在文库构建阶段应采取措施限制PCR克隆的产生,但对于绝大多数RAD-seq研究而言,PCR克隆未必会成为影响结果的显著偏倚来源。
创建时间:
2019-08-06



