Pixel segmentation model for Algerian grapevine varieties
收藏Zenodo2025-09-03 更新2026-05-26 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.17048166
下载链接
链接失效反馈官方服务:
资源简介:
0_main_data_preprocessing.py Summary
This script standardizes and processes a raw dataset of plant leaf images and their corresponding coordinate data for segmentation tasks. It prepares the data by resizing and aligning all samples to a uniform format, which is crucial for training deep learning models.
Inputs
A directory named TRAINING_DATA containing raw image files (e.g., .jpg, .png, .tiff).
Within the same directory, corresponding text files (.txt) for polygon coordinates of the leaf blade and veins.
Within the same directory, corresponding _info.csv files that link the image filenames to a unique sample ID.
Outputs
A new root directory named PROCESSED_DATA_FOR_SEGMENTATION containing the following subdirectories:
RGB_IMAGES: Resized and standardized .png image files.
BLADE_COORDS: Standardized text files (.txt) with coordinates for the leaf blade polygon.
VEIN_COORDS: Standardized text files (.txt) with coordinates for the leaf veins polygon.
PREPROCESSING_PLOTS: .png files that visually verify the correct transformation of images and coordinates by plotting the polygons as overlays on the processed images.
What the Code Does
Determines Target Size: It first scans all raw images to calculate a uniform target size for all samples, based on the 95th percentile of existing image dimensions. This ensures that most images are scaled down rather than up, while keeping a standard size for all outputs.
Standardizes Orientation: For each image, it checks the aspect ratio and rotates it to ensure the width is greater than or equal to the height. This and all subsequent transformations are also applied to the coordinate data.
Rescales and Pads: It rescales each image to fit the calculated target size while preserving the original aspect ratio. Any empty space is then padded with white to reach the final dimensions.
Transforms Coordinates: It transforms the blade and vein coordinates to match the new scale and position of the padded image.
Saves Processed Files: It saves the final, standardized image and the transformed coordinate data to their respective output directories.
Generates Visual Checks: It creates an overlay plot of the new coordinates on the new image, providing a quick visual check for quality assurance and correct data transformation.
1_generate_ground_truth_masks.py Summary
This script converts the standardized coordinate data from the previous step into pixel-level segmentation masks, which serve as the ground truth for training a segmentation model.
Inputs
A directory named PROCESSED_DATA_FOR_SEGMENTATION with the following subdirectories:
RGB_IMAGES: The standardized .png images.
BLADE_COORDS: Text files (.txt) containing the transformed blade coordinates.
VEIN_COORDS: Text files (.txt) containing the transformed vein coordinates.
Outputs
A new subdirectory within the PROCESSED_DATA_FOR_SEGMENTATION directory named GROUND_TRUTH_MASKS.
This directory will contain single-channel, grayscale .png image files that represent the pixel-level ground truth. The pixel values are encoded as follows:
0: Background
1: Leaf Blade
2: Leaf Veins
What the Code Does
Iterates Over Images: It loops through all the standardized .png images to ensure a mask is created for each one.
Creates Blank Mask: For each image, it creates a new, blank grayscale image with a black background (pixel value 0) that matches the dimensions of the standardized image.
Draws Polygons: It reads the standardized blade and vein coordinates and uses them to draw filled polygons on the blank mask.
The blade polygon is filled with a pixel value of 1.
The vein polygon is filled with a pixel value of 2, overwriting any blade pixels that it overlaps.
Saves Masks: The completed grayscale mask image is saved with a corresponding filename in the new output directory, ready for use in model training.
2_generate_11channel_inputs.py Summary
This script creates enhanced input data for a segmentation model by generating an 11-channel NumPy array for each image. It combines the original RGB data with additional features derived from different color spaces and image filters. This enriches the input and helps the model better distinguish between different structures like veins and the leaf blade.
Inputs
A directory named PROCESSED_DATA_FOR_SEGMENTATION with the RGB_IMAGES subdirectory containing standardized .png images.
Outputs
A new subdirectory within the PROCESSED_DATA_FOR_SEGMENTATION directory named 11CHANNEL_INPUTS.
This directory contains a series of .npy files. Each file is a single NumPy array with a shape of (H, W, 11), where H and W are the height and width of the standardized images, and 11 is the number of channels. The channels are ordered as follows:
Red (R)
Green (G)
Blue (B)
Grayscale
L* (from Lab color space)
a* (from Lab color space)
b* (from Lab color space)
Sato ridge filter response
Meijering ridge filter response
Frangi ridge filter response
Hessian ridge filter response
What the Code Does
Loads and Converts: It loads each standardized RGB image and converts it to a floating-point NumPy array normalized to the 0-1 range.
Generates New Channels: It applies a series of transformations and filters to the image to create new feature channels. These include:
Grayscale: A single channel representing the image's luminance.
Lab Color Space: Three channels (L*, a*, and b*) that separate luminance from color information, which can be useful for robust segmentation.
Ridge Filters: Four different types of filters (Sato, Meijering, Frangi, and Hessian) that enhance fine, tube-like structures, like plant veins, by responding strongly to them. The output of these filters is processed with contrast enhancement to highlight the features.
Combines Channels: All 11 channels (RGB, Grayscale, Lab, and four ridge filter outputs) are stacked together into a single, multi-dimensional NumPy array.
Saves NumPy Array: The final 11-channel array is saved as a .npy file, which is a highly efficient format for storing and loading NumPy data, making it ideal for direct use in a deep learning model.
3_generate_geodesic_masks.py Summary
This script generates a new input channel for the segmentation model: a geodesic distance map. This map measures the shortest path distance from a single starting point (the leaf's base) to every other pixel, but only traveling along the veins. This specialized channel provides the model with crucial information about the leaf's vascular structure and spatial relationships.
Inputs
A directory named PROCESSED_DATA_FOR_SEGMENTATION with the following subdirectories:
RGB_IMAGES: The standardized .png images.
VEIN_COORDS: Text files (.txt) containing the transformed vein coordinates, used to determine the leaf's base.
GROUND_TRUTH_MASKS: The binary masks from the previous step, used to identify the precise pixels that constitute the veins.
Outputs
A new subdirectory within PROCESSED_DATA_FOR_SEGMENTATION named GEODESIC_MASKS.
This directory contains a series of .npy files, each representing a single-channel geodesic distance map. The pixel values in these arrays range from 0 (at the leaf's base) to 1 (at the furthest vein tip), with all non-vein pixels set to 0.
Another new subdirectory named GEODESIC_OVERLAY_PLOTS.
This directory contains .png files that visually confirm the generated maps by overlaying a colormap of the geodesic distances on top of the original RGB images.
What the Code Does
Identifies the Starting Point: It uses the vein coordinate data to calculate the leaf's petiolar base (the point where the stem connects to the leaf). It then finds the single vein pixel that is closest to this calculated base point.
Performs Geodesic Pathfinding: It runs a Breadth-First Search (BFS) algorithm starting from that base pixel. This search, however, is constrained to only move to neighboring pixels that are also part of the vein mask. This ensures the distance is measured along the vein network, not across the entire leaf.
Creates and Normalizes the Map: The BFS algorithm populates a distance map, which is then normalized so that all distances fall within the 0-1 range. This makes the data consistent for training a neural network.
Saves Data and Visualizations: The final normalized map is saved as a .npy file. For quality control, the script also generates a visual plot, showing the RGB image with the geodesic map overlaid as a gradient, allowing collaborators to immediately see if the process worked correctly for each sample.
4_training.py Summary
This script trains a multi-task UNet model to perform two related tasks simultaneously: pixel-wise segmentation of leaf parts (background, blade, and veins) and geodesic distance prediction along the veins. This approach leverages the shared information between the tasks to improve overall performance, especially for the challenging vein segmentation.
Inputs
11CHANNEL_INPUTS directory: Contains the multi-channel NumPy arrays generated by 2_generate_11channel_inputs.py.
GROUND_TRUTH_MASKS directory: Contains the segmentation masks (.png files) generated by 1_generate_ground_truth_masks.py, with pixel values representing background (0), blade (1), or vein (2).
GEODESIC_MASKS directory: Contains the geodesic distance masks (.npy files) generated by 3_generate_geodesic_masks.py, with values from 0 to 1 for vein pixels and 0 for all others.
An optional bad_fids.txt file listing any data samples to be excluded from training.
Outputs
A checkpoints_vinifera directory containing the trained model's parameters, saved periodically and at the end of training.
A training_config.json file that logs all the hyperparameters and configurations used for the training run, ensuring reproducibility.
A plot (training_history.png) showing the loss and metric scores over each epoch for both the training and validation sets.
What the Code Does
Data Loading: It defines a custom MultiChannelLeafDataset class that loads the 11-channel input arrays, the multi-class segmentation masks, and the geodesic distance masks. It also handles data augmentation (random flips) and filters out any "bad" data samples.
Model Definition: It defines a U-Net architecture with a key modification: instead of a single output, it has two separate output heads.
The first head is a classification layer for the 3-class segmentation task.
The second head is a regression layer that predicts the geodesic distance, constrained to a 0-1 range.
Loss Calculation: During training, it calculates a total loss that is a weighted sum of two individual losses:
Segmentation Loss: A Cross-Entropy Loss that compares the model's segmentation output to the ground truth mask. It uses class weights to give more importance to the small and critical vein class.
Geodesic Loss: A Mean Squared Error (MSE) Loss that measures the difference between the model's geodesic prediction and the ground truth geodesic map. Importantly, this loss is calculated only on the vein pixels, as non-vein pixels have a value of 0 and would skew the result.
Training Loop: It iterates through epochs, feeding the data into the model. In each epoch, it performs the following:
Calculates the combined loss and uses backpropagation to update the model's weights.
Tracks key metrics such as Dice Coefficient (for segmentation) and Mean Absolute Error (MAE) (for geodesic distance) for both the training and validation datasets.
Saves the model checkpoint with the best validation Vein Dice score and plots the training history for analysis.
5_predict.py Summary
This script uses a pre-trained UNet model to perform inference on new, unseen leaf images. The goal is to not only predict a segmentation mask for each leaf, but to also automatically identify and extract each individual leaf as a separate, cropped component. This process allows for the analysis of individual leaves rather than entire images.
Inputs
INFERENCE_INPUT directory: This is a new directory structure for inference. It should contain subfolders, each representing a different class (e.g., Class_A, Class_B), with raw, unprocessed leaf images (.jpg, .jpeg, or .png) inside.
Trained Model File (.pt): The path to the best-performing model checkpoint from the training stage (e.g., V1_best_model_vein_dice_0.7697_epoch29.pt).
Outputs
A new top-level directory named INFERENCE_OUTPUTS is created, containing three subfolders:
COMPONENT_MASKS: Contains .png files of the segmentation masks for each individually cropped leaf component. Each mask has three classes: background (0), blade (1), and vein (2).
COMPONENT_RGB_CROPS: Contains .png files of the original RGB image, cropped to the bounding box of each individual leaf component.
COMPONENT_OVERLAYS: Contains .png files of the cropped RGB images with the segmentation prediction overlaid, making it easy to visually inspect the results. The vein predictions are highlighted using a vibrant colormap for clarity.
component_metadata.csv: A comprehensive .csv file that acts as a manifest for all extracted components. It contains key information for each leaf, including:
The path to the original source image.
A unique global identifier for the component.
Pixel counts for the blade, vein, and internal background "holes."
The coordinates and dimensions of the bounding box.
File names for the corresponding mask, crop, and overlay images.
What the Code Does
Preparation: It loads the pre-trained UNet model and sets it to evaluation mode. It also sets up a resumable prediction logic by checking for an existing metadata CSV file. If found, it continues processing from where it left off, avoiding redundant work.
Image Processing: For each new image in the input folders, the script:
Re-generates the 11-channel input array using the same pre-processing steps as the training phase to ensure compatibility with the model.
Passes the 11-channel data through the UNet model to get the segmentation prediction.
Component Extraction: The script then performs connected component analysis on the predicted segmentation mask. This process identifies each separate, contiguous "leaf" object.
Filtering: It applies a filter to discard any components that are too small (e.g., small debris or isolated noise pixels) based on a configurable minimum bounding box size.
Cropping and Saving: For each valid leaf component identified, it performs the following:
Crops the original RGB image and the predicted segmentation mask to the component's bounding box.
Calculates and stores metadata, such as pixel counts for each class.
Saves the cropped RGB image, the cropped segmentation mask, and a custom visual overlay to their respective output directories.
Metadata Management: All collected metadata is compiled into a single DataFrame and incrementally saved to the component_metadata.csv file, providing a structured, persistent record of the entire inference process.
提供机构:
Zenodo
创建时间:
2025-09-03



