Data from: Paralogs are revealed by proportion of heterozygotes and deviations in read ratios in genotyping by sequencing data from natural populations
收藏DataONE2016-10-18 更新2024-06-26 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈官方服务:
资源简介:
Whole genome duplications have occurred in the recent ancestors of many plants, fish, and amphibians, resulting in a pervasiveness of paralogous loci and the potential for both disomic and tetrasomic inheritance in the same genome (mixed ploidy). Paralogs can be difficult to reliably genotype and are often excluded from genotyping-by-sequencing (GBS) analyses; however, identification of paralogs is difficult without a reference genome. We present a method for identifying paralogs in natural populations by combining two properties of duplicated loci: 1) the expected frequency of heterozygotes exceeds that for singleton loci, and 2) within heterozygotes, observed read ratios for each allele in GBS data will deviate from the 1:1 expected for singleton (diploid) loci. These deviations are often not apparent within individuals, particularly when sequence coverage is low, but summing allele reads over all heterozygous individuals in a population should provide sufficient power to detect deviations. We identified paralogous loci in three species: Chinook salmon (Oncorhynchus tshawytscha) which retains regions with ongoing residual tetrasomy on eight chromosome arms following a recent whole genome duplication, mountain barberry (Berberis alpina) which has a large proportion of paralogs that arose through an unknown mechanism, and dusky parrotfish (Scarus niger) which has largely re-diploidized following an ancient whole genome duplication. Accuracy of our method was confirmed by comparing inferred and known copy-status for a subset of Chinook salmon loci. Importantly, this approach only requires the genotype and allele-specific read counts for each individual, information which is readily obtained from most GBS analysis pipelines.
诸多植物、鱼类与两栖动物的近期祖先均发生过全基因组复制(whole genome duplication)事件,致使旁系同源位点(paralogous loci)广泛分布,并令同一基因组兼具二体遗传(disomic inheritance)与四体遗传(tetrasomic inheritance)的潜在可能,即混倍性(mixed ploidy)。旁系同源位点往往难以进行可靠的基因分型,因此常被排除在测序分型(genotyping-by-sequencing,GBS)分析之外;然而,若缺乏参考基因组(reference genome),则难以对旁系同源位点进行有效识别。本研究提出一种可在自然种群中识别旁系同源位点的方法,该方法整合了重复位点的两项核心特征:其一,杂合子的预期出现频率高于单拷贝位点;其二,在GBS数据中,杂合子内每个等位基因的观测读数比例,会偏离单拷贝(二倍体)位点预期的1:1比例。此类偏差在单个个体中往往难以察觉,尤其当测序覆盖度较低时,但通过汇总种群内所有杂合个体的等位基因读数,即可获得足够的统计效力以检测此类偏差。我们在三个物种中鉴定出了旁系同源位点:奇努克鲑(Oncorhynchus tshawytscha)在近期发生全基因组复制后,其八条染色体臂仍保留有持续存在的残留四体性(residual tetrasomy)区域;山小檗(Berberis alpina)存在大量由未知机制产生的旁系同源位点;暗色鹦嘴鱼(Scarus niger)在古全基因组复制后已基本完成二倍体化(re-diploidization)。我们通过比对奇努克鲑部分位点的推断拷贝状态与已知拷贝状态,验证了本方法的准确性。值得注意的是,该方法仅需每个个体的基因型数据与等位基因特异性读数计数(allele-specific read counts),而这类信息可从绝大多数GBS分析流程中便捷获取。
创建时间:
2016-10-18



