Evaluating Illumina-, Nanopore-, and PacBio-based genome assembly strategies with the bald notothen, Trematomus borchgrevinki
收藏DataCite Commons2025-05-01 更新2025-05-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.ghx3ffbs3
下载链接
链接失效反馈官方服务:
资源简介:
For any genome-based research, a robust genome assembly is required. De
novo assembly strategies have evolved with changes in DNA sequencing
technologies and have been through at least three phases: i) short-read
only, ii) short- and long-read hybrid, and iii) long-read only assemblies.
Each of the phases has their own error model. We hypothesized that hidden
scaffolding errors in short-read assembly and erroneous long-read contigs
degrade the quality of short- and long-read hybrid assemblies. We
assembled the genome of T. borchgrevinki from data generated during each
of the three phases and assessed the quality problems we encountered. We
developed strategies such as k-mer-assembled region replacement, parameter
optimization, and long-read sampling to address the error models. We
demonstrated that a k-mer-based strategy improved short-read assemblies as
measured by BUSCO while mate-pair libraries introduced hidden scaffolding
errors and perturbed BUSCO scores. Further, we found that although hybrid
assemblies can generate higher contiguity, they tend to suffer from lower
quality. In addition, we found long-read-only assemblies can be optimized
for contiguity by sub-sampling length-restricted raw reads. Our results
indicate that long-read contig assembly is the current best choice and
that assemblies from phase I and phase II were of lower quality.
提供机构:
Dryad
创建时间:
2022-08-19



