MNBC: a multithreaded Minimizer-based Naïve Bayes Classifier for improved metagenomic sequence classification
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/10568964
下载链接
链接失效反馈官方服务:
资源简介:
These files provide supplementary data underlying the article doi.org/10.1093/bioinformatics/btae601 (see Figure 1 in the article):
37345_filtered_training_and_test_genomes_list.txt: Refseq assembly sequence filenames of the 37345 filtered training and test genomes
taxonomy_37345_filtered_training_and_test_genomes.txt: Taxonomy file for all 37345 filtered training and test genomes
Uniform_reference_database_31991_training_genomes_assemblyID_list.txt: Refseq assembly accessions of the 31991 training genomes in the uniform reference database
uniform_reference_database.tar.gz_1 to uniform_reference_database.tar.gz_10: Merge them into a single file using the cat command. The folder produced by decompressing this file is the uniform reference database.
taxonomy_uniform_reference_database.txt: Taxonomy file for the uniform reference database (i.e. the 31991 training genomes)
5354_test_genomes_assemblyID_list.txt: Refseq assembly accessions of the 5354 test genomes
testReads_NextSeq_C0.05.fasta.gz: 6562565 150bp-long positive reads randomly generated from the 5354 test genomes, simulating reads sequenced by NextSeq (0.05 coverge)
testReads_MiSeq_C0.05.fasta.gz: 3282728 300bp-long positive reads randomly generated from the 5354 test genomes, simulating reads sequenced by MiSeq (0.05 coverge)
testReads_Nanopore_C0.05.fasta.gz: 181912 positive reads of normally distributed 1kb-10kb lengths randomly generated from the 5354 test genomes, simulating reads sequenced by Nanopore (0.05 coverge)
negaReads_NextSeq_C0.05.fasta.gz: 10143 150bp-long negative reads randomly generated from Chromosome 1 of the Arabidopsis Thaliana reference genome, simulating reads sequenced by NextSeq (0.05 coverge)
negaReads_MiSeq_C0.05.fasta.gz: 5072 300bp-long negative reads randomly generated from Chromosome 1 of the Arabidopsis Thaliana reference genome, simulating reads sequenced by MiSeq (0.05 coverge)
negaReads_Nanopore_C0.05.fasta.gz: 277 negative reads of normally distributed 1kb-10kb lengths randomly generated from Chromosome 1 of the Arabidopsis Thaliana reference genome, simulating reads sequenced by Nanopore (0.05 coverge)
CAMI2_reference_database_16864_genomes_list.txt: Refseq assembly sequence filenames of the 16864 genomes and chromosomes in the reference database for CAMI2
taxonomy_CAMI2_reference_database.txt: Taxonomy file for the CAMI2 reference database
Tip: To directly use the taxonomy file "taxonomy_uniform_reference_database.txt", please use version v1.1 or earlier of the MNBC tool. If using later versions it needs regenerating with the "MNBC taxonomy" program.
创建时间:
2025-02-05



