"CNN-Attention Ensemble with Novelty Detection for Deep-Sea enDNA Classification Dataset"
收藏DataCite Commons2026-02-17 更新2026-05-03 收录
下载链接:
https://ieee-dataport.org/documents/cnn-attention-ensemble-novelty-detection-deep-sea-endna-classification-2
下载链接
链接失效反馈官方服务:
资源简介:
"This dataset supports the development and evaluation of deep learning models for taxonomic classification of environmental DNA (eDNA) sequences from deep-sea ecosystems. It comprises ribosomal RNA sequences spanning 129 phylum-level taxa, derived and curated from two major public databases: SILVA 138.1 SSURef NR99 and the NCBI Nucleotide repository (16S prokaryotic and 18S eukaryotic rRNA). All sequences are fixed at 500 base pairs and provided in four ready-to-use stages: raw FASTA files, NCBI downloads, preprocessed NumPy arrays (one-hot encoded, 5-channel), and augmented training arrays generated via reverse complementation and random substitution mutation. Four-mer (4-mer) frequency feature vectors (256 dimensions) are included alongside sequence arrays to support hybrid CNN and k-mer-based model architectures. The dataset was constructed to train and benchmark a CNN-Attention Ensemble with Novelty Detection framework, achieving 96.97% top-1 accuracy and 98.75% top-3 accuracy on the held-out test set. Class distribution ranges from 7 to 350 samples per phylum (median: 34), with a stratified 70\/15\/15 train\/validation\/test split. This dataset is intended to advance reproducible benchmarking of deep learning approaches for marine eDNA biodiversity monitoring and metagenomic taxonomic profiling."
提供机构:
IEEE DataPort
创建时间:
2026-02-17



