five

Arthropod Kraken2 Database v1

收藏
DataCite Commons2025-08-18 更新2026-05-05 收录
下载链接:
https://figshare.scilifelab.se/articles/dataset/Arthropod_Kraken2_Database_v1/29666605
下载链接
链接失效反馈
官方服务:
资源简介:
<b>Kraken2 Arthopod Reference Database </b><b><i>v.1</i></b>Kraken2 (v2.1.2) database containing all 2,593 reference assemblies for Arthropoda available on NCBI as of March 2023.This database was built for and used in the analysis of shotgun sequencing data of bulkDNA from Malaise trap samples collected by the Insect Biome Atlas, in the context of the manuscript "<b>Small Bugs, Big Data: Metagenomics for arthropod biodiversity monitoring</b>" by authors: López Clinton Samantha, Iwaszkiewicz-Eggebrecht Ela, Miraldo Andreia, Goodsell Robert, Webster Mathew T, Ronquist Fredrik, van der Valk Tom (for submission to <i>Ecology and Evolution</i>).For custom database building, Kraken2 requires all headers in reference assembly fasta files to be annotated with "<b>kraken:taxid|XXX"</b> at the end of each header. Where "<b>XXX"</b> is the corresponding National Center for Biotechnology Information (NCBI) taxID of the species. The code used to add the taxID information to each fasta file header, and update the <b>accession2taxid.map</b> file required by Kraken2 for database building, is available in this GitHub repository (also linked under "Related Materials" below).<b>Content</b>Below is a list of the files in this item (in addition to the README and MANIFEST files), and their description. The first three files (marked with a *) are required to run Kraken2 classifications using the database.* <b>hash.k2d.gz </b>- A hash file with all minimiser to taxon mappings (855 GB).* <b>opts.k2d</b> - A file containing all options used when building the Kraken2 database (64 B).* <b>taxo.k2d</b> - A file containing the taxonomy information used to build the database (385.9 KB).<b>seqid2taxid.map.gz</b> - A file containing contig accession numbers and their corresponding taxids (810.6 MB). Note that this file is needed by Kraken2 when <b>building</b> the database, and as it was updated during custom building, it has been included for reference, but it is not required to use the database for <b>classification</b>.<b>genome_assembly_metadata.tsv</b> - NCBI-generated table (tsv format, gzipped) of all reference assemblies for Arthropoda as of March 2023, which were used in the database construction. This includes columns: Assembly Accession, Assembly Name, Organism Name, Organism Infraspecific Names Breed, Organism Infraspecific Names Strain, Organism Infraspecific Names Cultival, Organism Infraspecific Names Ecotype, Organism Infraspecific Names Isolate, Organism Infraspecific Names Sex, Annotation Name, Assembly Stats Total Sequence Length, Assembly Level, Assembly Submission, and WGS project accession.<b>How to use the database</b>Download the <b>hash.k2d.gz</b>, <b>opts.k2d</b>, and <b>taxo.k2d</b> files to the same directory (e.g. /PATH/TO/DATABASE/).Unzip the <b>hash.k2d.gz</b> file.Install or load Kraken2 to run classification on sequencing data using the database.When running Kraken2, indicate the path to the directory (not the individual files) with the <b>--db</b> flag (e.g. kraken2 --db /PATH/TO/DATABASE/ ...).Note that the whole database must be loaded into memory by Kraken2 to be able to classify any sequencing reads, so ensure you have access to enough memory before running (the uncompressed hash file is around 1.1 TB).We also recommend using the Kraken2 option<b> </b><b>--memory-mapping</b>, as it ensures the database is loaded once for all samples, instead of once for each individual sample, saving considerable time and resources.For more information on using Kraken2, see the Kraken2 wiki manual.<br><br>This database was built by <b>Samantha López Clinton</b> (samantha.lopezclinton@nrm) and <b>Tom van der Valk</b> (tom.vandervalk@nrm.se).
提供机构:
Swedish Museum of Natural History
创建时间:
2025-08-11
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作