five

Supporting data for "MBGC: Multiple Bacteria Genome Compressor"

收藏
DataCite Commons2025-05-26 更新2025-04-15 收录
下载链接:
http://gigadb.org/dataset/100967
下载链接
链接失效反馈
官方服务:
资源简介:
Genomes within the same species reveal large similarity, exploited by specialized multiple genome compressors. The existing algorithms and tools are however targeted at large, e.g., mammalian, genomes, and their performance on bacteria strains is rather moderate. <br>In this work, we propose MBGC, a specialized genome compressor making use of specific redundancy of bacterial genomes. Its characteristic features are finding both direct and reverse-complemented LZ-matches, as well as a careful management of a reference buffer in a multi-threaded implementation. Our tool is not only compression efficient, but also fast. On a collection of 168,311 bacterial genomes, totalling 587GB, we achieve the compression ratio around the factor of 1265, and the compression (resp. decompression) speed around 1580MB/s (resp. 780MB/s) using 8 hardware threads, on a computer with a 14-core / 28-thread CPU and a fast SSD, being almost 3 times more succinct and over 6 times faster in the compression than the next best competitor.
提供机构:
GigaScience Database
创建时间:
2022-01-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作