Supporting data for "Generation and application of pseudo-long reads for metagenome assembly"
收藏DataCite Commons2025-05-26 更新2025-04-15 收录
下载链接:
http://gigadb.org/dataset/102214
下载链接
链接失效反馈官方服务:
资源简介:
Metagenome assembly using high-throughput sequencing data is a powerful method to construct microbial genomes in environmental samples without cultivation. However, metagenome assembly, especially when only short reads are available, is a complex and challenging task because mixed genomes of multiple microorganisms constitute the metagenome. Although long read sequencing technologies have been developed and begun to be used for metagenome assembly, many metagenome studies have been performed based on short reads because long reads have higher sequencing cost and error rate than short reads. <br>In this study, we present a new method called PLR-GEN. It can create pseudo-long reads from metagenome short reads based on given reference genome sequences by considering small sequence variations existing in individual genomes of the same or different species. When applied to a mock community dataset in the Human Microbiome Project, PLR-GEN dramatically extended short reads in length of 101 bp to pseudo-long reads with N50 of 33 Kbp and 0.4 % error rate. The use of these pseudo-long reads generated by PLR-GEN resulted in an obvious improvement of metagenome assembly in terms of the number of sequences, assembly contiguity, and prediction of species and genes. <br>PLR-GEN can be used to generate artificial long-read sequences without spending extra sequencing cost, thus aiding various studies using metagenomes.
提供机构:
GigaScience Database
创建时间:
2022-04-05



