Caribou pipeline for the alignment-free bacterial identification and classification in metagenomics sequencing data using machine learning
收藏DataCite Commons2026-04-21 更新2025-04-15 收录
下载链接:
https://www.frdr-dfdr.ca/repo/dataset/6536e425-a10a-46b1-acae-da529f061915
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains sequencing data used to train the models of the Caribou pipeline. We developed this pipeline for alignment-free bacterial identification and classification in metagenomics sequencing data using machine learning. The datasets were derived from the GTDB v.202 database (https://data.gtdb.ecogenomic.org/releases/release202/202.0/) and include training steps using the species representatives, as the benchmark datasets used non-representative whole genomes. We also simulated sequencing reads to evaluate and compare performance on whole genomes and sequencing reads. We provide models and encoding files of CNN-trained models; datasets used for training, validation and testing of models, randomly sampled from representative genomes; and datasets used for benchmarking the method against state-of-the-art methods, randomly sampled from non-representative whole genomes and simulated reads.
提供机构:
Federated Research Data Repository / dépôt fédéré de données de recherche
创建时间:
2024-12-12



