five

Generation of a global freshwater algal taxonomic database by application of PCR-free rbcL gene detection and machine-learning-based taxonomic classification to public metagenome datasets

收藏
Figshare2025-08-27 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/_b_Generation_of_a_global_freshwater_algal_taxonomic_database_by_application_of_PCR-free_b_b_i_rbcL_i_b_b_gene_detection_and_machine-learning-based_taxonomic_classification_to_public_metagenome_datasets_b_/29996962
下载链接
链接失效反馈
官方服务:
资源简介:
Taxonomic Classifier:All rbcL accessions were collected from the Barcode of Life Database (BOLD) Feb 4, 2023 public release. Bacterial sequences and accessions without species assignments were removed, leaving 91,997 sequences. High-confidence and informative bacterial rbcL sequences were amended to the database. These included all rbcL collected from isolate genomes in the JGI IMG database corresponding to phylum Cyanobacteriota and all American Type Culture Collection (ATCC) strains, 565 in total. The single longest sequence for each species was utilized for classifier training (43,315 sequences) while the remaining 49,247 sequences were utilized for model validation. Training set sequences and their corresponding seven-level taxonomy were used to train a naïve-Bayes classifier operated in a qiime2-amplicon-2023.9 conda environment. To use this classifier simply follow standard operating procedures for classification of imported reads in the qiime2-amplicon-2023.9 conda environment. Note that the classifier may not function in different qiime2 versions due to varying underlying scikit versions that the environment utilizes.rbcL Database:All 4,206 assembled freshwater metagenomes in the IMG database, defined as those with the metadata tag “Ecosystem Type = Freshwater”, were delineated on February 23, 2023 and scanned for genes that had been assigned domain annotation pfam00016 by the IMG metagenome annotation pipeline. Gene nucleotide sequences were downloaded along with source genome metadata and estimated scaffold read depth metrics. Genes were classified using the constructed rbcL taxonomic classifier following import into qiime2-amplicon-2023.9 operated with a minimum confidence threshold of 0.7.Table S1 includes all rbcL sequences detected in the course of the analysis, regardless of final taxonomy.Table S2 includes only rbcL sequences classified as one of the eight trained algal phyla.
创建时间:
2025-08-27
二维码
社区交流群
二维码
科研交流群
商业服务