Cosmopolites sordidus genome assemblies
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.f1vhhmh2r
下载链接
链接失效反馈官方服务:
资源简介:
PacBio HiFi sequencing was employed in combination with metagenomic binning to produce a high-quality reference genome of Cosmopolites sordidus. We compared k-mer and alignment reference-based pre-binning and post-binning approaches to remove contamination. We were also interested to know if the post-binning approach had interspersed Bacterial contamination within intragenic regions of Arthropoda-binned contigs. Our analyses identified 3,433 genes that were composed with reads identified as of putative bacterial origins. The pre-binning approach yielded a C. sordidus genome of 1.07Gb genome composed of 3,089 contigs with 98.6% and 97.1% complete and single copy genome and protein BUSCO scores respectively. In this paper, we demonstrate that in this case, the pre-binning approach does not sacrifice assembly quality for more stringent metagenomic filtering. We also determine post-binning allows for increased intragenic contamination increased with increasing coverage, but the frequency of gene contamination increased with lower coverage. Finally, NCBI’s new FCS-GX program was used as a final post-assembly classification approach and identified contamination in both pre- and post-binning assemblies. This indicates that both pre- and post-binning approaches are required to fully remove contamination. Future work should focus on developing reference-free pre-binning approaches for HiFi reads produced from eukaryotic-based metagenomic samples.
Methods
PacBio HiFi Sequencing of high molecular weight DNA from Cosmopolites sordidus pupa. Briefly, assembly via hifiasm, repeat masking via RepeatModeler and RepeatMasker, gene prediction via Braker2 followed by functional annotation via blastx to swissprot, DIAMOND to eggNog, and OrthoFinder.
创建时间:
2023-10-06



