EMDS-7 DiffuMorph: Latent Diffusion-Augmented Environmental Microorganism Dataset for Compound Microscopy
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/vh3nbj892g
下载链接
链接失效反馈官方服务:
资源简介:
The EMDS-7 DiffuMorph dataset is designed for automated environmental microorganism classification using deep learning and machine learning techniques. The dataset provides a balanced and reproducible benchmark for multi-class microscopic image classification using compound microscope imagery.
This dataset is derived from the publicly available EMDS-7 (Environmental Microorganism Dataset – Seventh Version) and focuses exclusively on single-microorganism images to ensure reliable image-level classification. Multi-organism images were removed to maintain label integrity and classification consistency.
The original EMDS-7 dataset contains 41 microorganism classes. However, during dataset refinement, five classes were removed due to visual ambiguity, high morphological similarity, and potential redundancy between microorganism categories. Removing these classes helps reduce classification confusion and improves the reliability of machine learning model evaluation.
Removed classes from the original EMDS-7 dataset include:
G008_Tribonema, G025_Coelosphaerium, G028_Synedra, G033_Coelastrum, and G039_Diversicornis.
After this refinement process, the final dataset contains 36 distinct microorganism classes with balanced representation across all categories.
The dataset consists of 2,700 compound microscope images, equally distributed across the 36 microorganism classes.
Each class includes:
• Original images selected from the EMDS-7 dataset after filtering multi-object samples.
• Augmented images using rotations, flips, scaling, and intensity adjustments, preserving microorganism morphology.
• Synthetic images generated with class-wise latent diffusion, adding realistic variability while preserving biologically plausible morphology.
Dataset Structure
Each of the 36 microorganism classes contains:
• Training set: 50 images
• Validation set: 15 images
• Test set: 10 images
Total images per class: 75
Dataset Statistics
• Original Classes in EMDS-7: 41
• Removed Classes: 5 (due to redundancy and ambiguity)
• Final Classes Used: 36
• Training Images: 1,800
• Validation Images: 540
• Test Images: 360
• Total Images: 2,700
Applications
This dataset can support research in:
• Environmental microorganism classification
• Microscopy image analysis
• Deep learning and computer vision
• Morphological pattern recognition
• Ecological monitoring and water quality assessment
创建时间:
2026-03-16



