Data from: Paralogs are revealed by proportion of heterozygotes and deviations in read ratios in genotyping by sequencing data from natural populations
收藏DataCite Commons2025-05-01 更新2025-04-09 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.cm08m
下载链接
链接失效反馈官方服务:
资源简介:
Whole genome duplications have occurred in the recent ancestors of many
plants, fish, and amphibians, resulting in a pervasiveness of paralogous
loci and the potential for both disomic and tetrasomic inheritance in the
same genome. Paralogs can be difficult to reliably genotype and are often
excluded from genotyping-by-sequencing (GBS) analyses; however, removal
requires paralogs to be identified which is difficult without a reference
genome. We present a method for identifying paralogs in natural
populations by combining two properties of duplicated loci: 1) the
expected frequency of heterozygotes exceeds that for singleton loci, and
2) within heterozygotes, observed read ratios for each allele in GBS data
will deviate from the 1:1 expected for singleton (diploid) loci. These
deviations are often not apparent within individuals, particularly when
sequence coverage is low; but, we postulated that summing allele reads for
each locus over all heterozygous individuals in a population would provide
sufficient power to detect deviations at those loci. We identified
paralogous loci in three species: Chinook salmon (Oncorhynchus
tshawytscha) which retains regions with ongoing residual tetrasomy on
eight chromosome arms following a recent whole genome duplication,
mountain barberry (Berberis alpina) which has a large proportion of
paralogs that arose through an unknown mechanism, and dusky parrotfish
(Scarus niger) which has largely re-diploidized following an ancient whole
genome duplication. Importantly, this approach only requires the genotype
and allele-specific read counts for each individual, information which is
readily obtained from most GBS analysis pipelines.
提供机构:
Dryad
创建时间:
2016-10-18



