Supporting data for "Substantial GC-bias impacts genomic and metagenomic reconstructions, significantly underrepresenting GC-poor organisms"
收藏DataCite Commons2025-05-26 更新2025-04-15 收录
下载链接:
http://gigadb.org/dataset/100696
下载链接
链接失效反馈官方服务:
资源简介:
Metagenomic sequencing is a well-established tool in the modern biosciences. While it promises unparalleled insights into the genetic content of the biological samples studied, conclusions drawn are at risk from biases inherent to the DNA sequencing methods, including inaccurate abundance estimates as a function of genomic GC contents. We explored such GC-biases across many commonly used platforms in experiments sequencing multiple genomes (with mean GC contents ranging from 28.9% to 62.4%) and metagenomes. GC-bias profiles varied among different library preparation protocols and sequencing platforms. We found that our workflows employing MiSeq and NextSeq suffered major GC-biases, with problems becoming increasingly severe outside the 45-65% GC range, leading to a falsely low coverage in GC-rich and especially GC-poor sequences, where genomic windows with 30% GC content had over 10-fold less coverage than windows close to 50% GC content. We also showed that GC content correlates very tightly with coverage biases. The PacBio and HiSeq platforms also evidenced similar profiles of GC-biases to each other which were distinct from those seen in the MiSeq and NextSeq workflows. The Oxford Nanopore workflow was not afflicted with GC-bias. These findings indicate potential sources of difficulty, arising from GC-biases, in genome sequencing which could be pre-emptively addressed with methodological optimisations provided that the GC-biases inherent to the relevant workflow are understood. Furthermore, it is recommended that a more critical approach is taken in quantitative abundance estimates in metagenomic studies. In the future, metagenomic studies should take steps to account for the effects of GC-bias before drawing conclusions, or they should employ a demonstrably unbiased workflow.
提供机构:
GigaScience Database
创建时间:
2020-01-14



