five

COI rCRUX filtered metabarcoding reference database and naive-bayes classifier

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10456133
下载链接
链接失效反馈
官方服务:
资源简介:
COI metabarcoding database and naive-bayes classifier in QIIME2 .qza format, with Insecta and Amphibia sequences removed. Original database downloaded from here, built with rCRUX using the Leray CO1 primers. rCRUX details rCRUX generated by combining and de-replicating the following databases:  Leray CO1-ncbi-mitochondrial (https://doi.org/10.5281/zenodo.8407603) Leray CO1-embl (https://doi.org/10.5281/zenodo.8407606) Leray CO1-searchterm (https://doi.org/10.5281/zenodo.8407620)   Primer Name:  Leray CO1Gene:   CO1Length of Target:    ~313Forward Sequence (5'-3'):   GGWACWGGWTGAACWGTWTAYCCYCCReverse Sequence (5'-3'):    TANACYTCnGGRTGNCCRAARAAYCAReference:   Leray, M., Yang, J. Y., Meyer, C. P., Mills, S. C., Agudelo, N., Ranwez, V., ... & Machida, R. J. (2013). A new versatile primer set targeting a short fragment of the mitochondrial COI region for metabarcoding metazoan diversity: application for characterizing coral reef fish gut contents. Frontiers in zoology, 10(1), 34.   Details to filter database and train classifier: 1. Pull out all taxonomic identifiers from the matching to terms 'Insecta' or 'Amphibia' using grep grep 'Insecta' CO1_combined_derep_and_clean_taxonomy.txt > CO1_combined_derep_and_clean_taxonomy-Insecta.txt grep 'Amphibia' CO1_combined_derep_and_clean_taxonomy.txt > CO1_combined_derep_and_clean_taxonomy-Amphibia.txt cat CO1_combined_derep_and_clean_taxonomy-Insecta.txt CO1_combined_derep_and_clean_taxonomy-Amphibia.txt > CO1_combined_derep_and_clean_taxonomy-Insecta-Amphibia.txt   2. Use this list of taxonomic identifiers to filter the taxonomy file for those two groups (Python script "grep-vf_Python.py"; attached here)   3. From the output file of grep-vf_Python.py, only the first column is the actual fasta header, so extract that column with awk: awk '{print $1}' CO1_combined_derep_and_clean_taxonomy-noInsectaAmphibia.txt > filtered_taxa_toextract.txt4. Use this list of non-Amphibia or Insecta taxa to filter the original COI database fastaseqkit grep -f filtered_taxa_toextract.txt CO1_combined_derep_and_clean.fasta > CO1_combined_derep_and_clean-noInsectaAmphibia.fa   5. Convert the filtered fasta and taxonomy files to QIIME2 .qza format: qiime tools import --type 'FeatureData[Sequence]' \--input-path CO1_combined_derep_and_clean-noInsectaAmphibia.fasta \--output-path COI_rCRUX_filt_20231110.qzaqiime tools import --type 'FeatureData[Taxonomy]' \--input-path CO1_combined_derep_and_clean_taxonomy-noInsectaAmphibia.txt \--output-path COI_rCRUX_taxonomy_filt_20231110.qza \--input-format 'HeaderlessTSVTaxonomyFormat'   6. Finally, train the classifier: qiime feature-classifier fit-classifier-naive-bayes \--i-reference-reads COI_rCRUX_filt_20231110.qza \--i-reference-taxonomy COI_rCRUX_taxonomy_filt_20231110.qza \--p-classify--chunk-size 5000 \--o-classifier COI_rCRUX_filt_20231110-classifier.qza
创建时间:
2024-01-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作