five

Supporting data for "Fast and accurate estimation of multidimensional site frequency spectra from low-coverage high-throughput sequencing data"

收藏
Mendeley Data2024-01-31 更新2024-06-27 收录
下载链接:
http://gigadb.org/dataset/100988
下载链接
链接失效反馈
官方服务:
资源简介:
The site frequency spectrum summarises the distribution of allele frequencies throughout the genome, and it is widely used as a summary statistic to infer demographic parameters and to detect signals of natural selection. The use of high-throughput low-coverage DNA sequencing data can lead to biased estimates of the site frequency spectrum due to high levels of uncertainty in genotyping. Here we design and implement a method to efficiently and accurately estimate the multidimensional site frequency spectrum for large numbers of haploid or diploid individuals across an arbitrary number of populations, using low-coverage sequencing data. The method maximises a likelihood function that represents the probability of the sequencing data observed given a multi-dimensional site frequency spectrum using genotype likelihoods. Notably, it uses an advanced binning heuristic paired with an accelerated expectation-maximisation algorithm for a fast and memory-efficient computation, and can generate both unfolded and folded spectra and bootstrapped replicates for haploid and diploid genomes. Based on extensive simulations, we show that the new method requires remarkably less storage and is faster than previous implementations whilst retaining the same accuracy. When applied to low-coverage sequencing data from the fungal pathogen Neonectria neomacrospora, results recapitulate the patterns of population differentiation generated using the original high-coverage data. The new implementation allows for accurate estimation of population genetic parameters from arbitrarily large, low-coverage data sets, thus facilitating cost-effective sequencing experiments in model and non-model organisms.
创建时间:
2024-01-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作