five

Data from: Genotyping-by-sequencing for estimating relatedness in non-model organisms: avoiding the trap of precise bias

收藏
DataONE2017-12-08 更新2024-06-26 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈
官方服务:
资源简介:
There has been remarkably little attention to using the high resolution provided by genotyping-by-sequencing (i.e. RADseq and similar methods) datasets for assessing relatedness in wildlife populations. A major hurdle is the genotyping error, especially allelic dropout, often found in this type of dataset that could lead to downward-biased, yet precise, estimates of relatedness. Here we assess the applicability of genotyping-by-sequencing datasets for relatedness inferences given their relatively high genotyping error rates. Individuals of known relatedness were simulated under genotyping error, allelic dropout, and missing data scenarios based on an empirical ddRAD dataset, and their true relatedness was compared to that estimated by seven relatedness estimators. We found that an estimator chosen through such analyses can circumvent the influence of genotyping error, with the estimator of Ritland (1996) shown to be unaffected by allelic dropout and to be the most accurate when there is genotyping error. We also found that the choice of estimator should not rely solely on the strength of correlation between estimated and true relatedness as a strong correlation does not necessarily mean estimates are close to true relatedness. We also demonstrated how even a large SNP dataset with genotyping error (allelic dropout or otherwise) or missing data still performs better than a perfectly genotyped microsatellite dataset of tens of markers. The simulation-based approach used here can be easily implemented by others on their own genotyping-by-sequencing datasets to confirm the most appropriate and powerful estimator for their dataset.

迄今为止,针对利用基于测序的基因分型(genotyping-by-sequencing,即RADseq及同类技术)数据集所提供的高分辨率来评估野生动物种群内亲缘关系的研究仍极为匮乏。此类数据集普遍存在的基因分型误差(尤其是等位基因缺失)是一大核心障碍,这类误差会导致亲缘关系估计值呈现精确但偏低的偏差。鉴于此类数据集的基因分型误差率相对偏高,本研究旨在评估基于测序的基因分型数据集用于亲缘关系推断的适用性。本研究基于一项真实ddRAD数据集,在基因分型误差、等位基因缺失及数据缺失的场景下模拟了已知亲缘关系的个体,并将其真实亲缘关系与七种亲缘关系估计器得到的估计值进行对比。研究发现,通过此类分析筛选出的估计器可规避基因分型误差的影响,其中Ritland(1996)提出的估计器被证实不受等位基因缺失的干扰,且在存在基因分型误差时表现出最高的准确性。此外,本研究还发现,不应仅依据估计亲缘关系与真实亲缘关系之间的相关系数强度来选择估计器——强相关并不代表估计值与真实亲缘关系数值相近。本研究同时证实,即便大型单核苷酸多态性(SNP)数据集存在基因分型误差(如等位基因缺失或其他类型误差)或数据缺失问题,其表现仍优于仅含数十个标记的完美分型微卫星(microsatellite)数据集。本研究采用的基于模拟的分析方法,可被其他研究者轻松应用于自身的基于测序的基因分型数据集,从而为其数据集筛选出最适配且高效的亲缘关系估计器。
创建时间:
2017-12-08
二维码
社区交流群
二维码
科研交流群
商业服务