five

MNBC: a multithreaded Minimizer-based Naïve Bayes Classifier for improved metagenomic sequence classification

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/10568964
下载链接
链接失效反馈
官方服务:
资源简介:
These files provide supplementary data underlying the article doi.org/10.1093/bioinformatics/btae601 (see Figure 1 in the article): 37345_filtered_training_and_test_genomes_list.txt: Refseq assembly sequence filenames of the 37345 filtered training and test genomes taxonomy_37345_filtered_training_and_test_genomes.txt: Taxonomy file for all 37345 filtered training and test genomes Uniform_reference_database_31991_training_genomes_assemblyID_list.txt: Refseq assembly accessions of the 31991 training genomes in the uniform reference database uniform_reference_database.tar.gz_1 to uniform_reference_database.tar.gz_10: Merge them into a single file using the cat command. The folder produced by decompressing this file is the uniform reference database. taxonomy_uniform_reference_database.txt: Taxonomy file for the uniform reference database (i.e. the 31991 training genomes) 5354_test_genomes_assemblyID_list.txt: Refseq assembly accessions of the 5354 test genomes testReads_NextSeq_C0.05.fasta.gz: 6562565 150bp-long positive reads randomly generated from the 5354 test genomes, simulating reads sequenced by NextSeq (0.05 coverge) testReads_MiSeq_C0.05.fasta.gz: 3282728 300bp-long positive reads randomly generated from the 5354 test genomes, simulating reads sequenced by MiSeq (0.05 coverge) testReads_Nanopore_C0.05.fasta.gz: 181912 positive reads of normally distributed 1kb-10kb lengths randomly generated from the 5354 test genomes, simulating reads sequenced by Nanopore (0.05 coverge) negaReads_NextSeq_C0.05.fasta.gz: 10143 150bp-long negative reads randomly generated from Chromosome 1 of the Arabidopsis Thaliana reference genome, simulating reads sequenced by NextSeq (0.05 coverge) negaReads_MiSeq_C0.05.fasta.gz: 5072 300bp-long negative reads randomly generated from Chromosome 1 of the Arabidopsis Thaliana reference genome, simulating reads sequenced by MiSeq (0.05 coverge) negaReads_Nanopore_C0.05.fasta.gz: 277 negative reads of normally distributed 1kb-10kb lengths randomly generated from Chromosome 1 of the Arabidopsis Thaliana reference genome, simulating reads sequenced by Nanopore (0.05 coverge) CAMI2_reference_database_16864_genomes_list.txt: Refseq assembly sequence filenames of the 16864 genomes and chromosomes in the reference database for CAMI2 taxonomy_CAMI2_reference_database.txt: Taxonomy file for the CAMI2 reference database Tip: To directly use the taxonomy file "taxonomy_uniform_reference_database.txt", please use version v1.1 or earlier of the MNBC tool. If using later versions it needs regenerating with the "MNBC taxonomy" program.
创建时间:
2025-02-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作