five

NanoClass-compatible BOLD CO1 databases

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/5751456
下载链接
链接失效反馈
官方服务:
资源简介:
BOLD CO1 databases reformatted to use in NanoClass (https://github.com/ejongepier/NanoClass; version 0.3.0-beta or higher) and QIIME2. Three separate databases are included for use in combination with primers mtD, LCO-HCO and CI. Databases include reference sequences and reference taxonomies for the use in NanoClass, as well as pre-trained classifiers for use in QIIME2. See usage instructions below. For questions, please contact e.jongepier@uva.nl. ========================================== WARNING ========================================== Please note this version of a custom BOLD CO1 db comes with absolutely no warranties. When using this db in NanoClass, mind that it has only been tested with methods: ["megablast","minimap","spingo"] NanoClass cannot be run in combination with these BOLD CO1 databases using methods ["mothur","centrifuge","kraken"]. Compatibility with ["blast","dcmegablast","qiime","rdp"] is untested. Just remove the tools you want to skip from the NanoClass/config.yaml (see also the NanoClass documentation here: https://ejongepier.github.io/NanoClass/) Never use this data base in combination with the NanoClass snakemake -F parameter or this BOLD CO1 database will be overwriten by the default 16S SILVA database. ========================================== DESCRIPTION ========================================== BOLD CO1 database (last) downloaded on 20210420 and reformatted for use in QIIME2 and NanoClass. To clean-up BOLD CO1 db these steps were taken (step 7 to 11 were repeated for each of the 3 primers): - remove identical duplicates [3597874] - drop seqs with non-IUPAC characters [3597839] - remove leading and trailing ambiguous bases [3597839] - remove low quality reads - remove reads with homopolymer runs - filter by length - extract fragments between primer sequences [mtD:112450; CI:121391; LCO-HCO:65307] - dereplicate / cluster [mtD:55075; CI:46470; LCO-HCO:24835] - remove uninformative taxonomic labels [mtD:55073; CI:46466; LCO-HCO:24832] - reformat db for use in NanoClass - train classifier based on fragments   ========================================== HOW TO USE THESE DBS ========================================== Use in NanoClass: Unzip the database and copy the reference taxonomy and (unzipped) reference sequences to the NanoClass/db/common directory, like so: $ cp mtD/bold-v20210421-taxonomy-mtD.tsv /path/to/NanoClass/db/common/ref-taxonomy.txt $ gzip -d -c  mtD/bold-v20210421-frags-mtD.fa.gz > /path/to/NanoClass/db/common/ref-seqs.fna Something similar can be done for the other two primers (CI or LCO-HCO). Only these three primers are supported at this point. Next, create an (empty) ref-seqs.aln file just to prevent NanoClass from automatically downloading the default 16S SILVA database, which would overwrite the BOLD db you just copied into NanoClass/db/common. $ touch /path/to/NanoClass/db/common/ref-seqs.aln Finally, you need to make a change to the NanoClass/Snakefile (i.e change first line into the second). optrules.extend(["plots/precision.pdf"] if len(config["methods"]) > 2 else []) optrules.extend(["plots/precision.pdf"] if len(config["methods"]) > 200 else []) This will disable the computation of precision plots by NanoClass as this is not supported in combination with the custom BOLD CO1 databases. Also mind that you need to change the nanofilt minlen and maxlen in the NanoClass/config.yaml to capture the appropriate fragment length for your primer. For the mtD primer I used minlen 600 and maxlen 900 for testing. Use in QIIME2: You can use the trained classifier directly in QIIME2, like so: $ qiime feature-classifier classify-sklearn \   --i-classifier mtD/bold-v20210421-classifier-mtD.qza \   --i-reads .qza \   --o-classification .qza \   --verbose Something similar can be done for the other two primers (CI or LCO-HCO). Only these three primers are supported at this point. The classifiers have only been tested with with the sklearn algorithm.
创建时间:
2021-12-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作