Data from: How to optimize the precision of allele and haplotype frequency estimates using pooled-sequencing data
收藏DataCite Commons2025-06-01 更新2025-06-15 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.cr65v
下载链接
链接失效反馈官方服务:
资源简介:
Sequencing pools of individuals rather than individuals separately reduces
the costs of estimating allele frequencies at many loci in many
populations. Theoretical and empirical studies show that sequencing pools
comprising a limited number of individuals (typically fewer than 50)
provides reliable allele frequency estimates, provided that the DNA
pooling and DNA sequencing steps are carefully controlled. Unequal
contributions of different individuals to the DNA pool and the mean and
variance in sequencing depth both can affect the standard error of allele
frequency estimates. To our knowledge, no study separately investigated
the effect of these two factors on allele frequency estimates; so that
there is currently no method to a priori estimate the relative importance
of unequal individual DNA contributions independently of sequencing depth.
We develop a new analytical model for allele frequency estimation that
explicitly distinguishes these two effects. Our model shows that the DNA
pooling variance in a pooled sequencing experiment depends solely on two
factors: the number of individuals within the pool and the coefficient of
variation of individual DNA contributions to the pool. We present a new
method to experimentally estimate this coefficient of variation when
planning a pooled sequencing design where samples are either pooled before
or after DNA extraction. Using this analytical and experimental framework,
we provide guidelines to optimize the design of pooled sequencing
experiments. Finally, we sequence replicated pools of inbred lines of the
plant Medicago truncatula and show that the predictions from our model
generally hold true when estimating the frequency of known multilocus
haplotypes using pooled sequencing.
提供机构:
Dryad
创建时间:
2017-10-02



