five

ModEst - Precise estimation of genome size from NGS data

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.dr7sqvb0j
下载链接
链接失效反馈
官方服务:
资源简介:
Accurate estimates of genome sizes are important parameters for both theoretical and practical biodiversity genomics. We present here a fast, easy-to-implement and precise method to estimate genome size from the number of bases sequenced and the mean sequencing depth. To estimate the latter, we take advantage of the fact that a precise estimation of the Poisson distribution parameter lambda is possible from truncated data, restricted to the part of the sequencing depth distribution representing the true underlying distribution. With simulations we could show that reasonable genome size estimates can be gained even from low-coverage (10X), highly discontinuous genome drafts. Comparison of estimates from a wide range of taxa and sequencing strategies with flow-cytometry estimates of the same individuals showed a very good fit and suggested that both methods yield comparable, interchangeable results. Methods To illustrate the influence of factors like sequencing depth, genome size, repeat content and -distribution on the different genome size estimation methods, we simulated five different genomes according to real examples. The latest genome assemblies and annotations of Saccharomyces cerevisae, Caenorhabditis elegans, Arabidopsis thaliana, Drosophila melanogaster and Scophthalmus maximus were used to obtain distributions of size and distance between annotated repeat regions. Simulated genomes of the size of the five genome assemblies mentioned above were then created using a custom Python-tool, available at https://github.com/Croxa/Simulate-Genome. Regions annotated as repeat regions (rr) were filled with random repeat units up to 10 bp length, high complexity regions with random nucleotides. For sake of ease, we simulated the genomes on a single chromosome. A mean GC content of 0.5 was applied to both categories.
创建时间:
2022-01-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作