Disease Dataset of Wheat: Original, Augmented, and Balanced for Deep Learning
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://data.mendeley.com/datasets/5gc7hwydwg
下载链接
链接失效反馈官方服务:
资源简介:
The dataset originated from a wheat crop field in Bangladesh where five separate wheat leaf image categories exist. The dataset contains 1,603 original images with 1920 × 1080 pixels resolution and is separated into five different disease categories consisting of Black Point (303 images), Fusarium Foot Rot (250 images), Healthy Leaf (250 images), Leaf Blight (400 images), and Wheat Blast (400 images).
The application of data augmentation techniques produced 1,000 additional images per class to balance the dataset before creating the augmented dataset. After data augmentation, the total number of images across the 5,000 dataset represents an equal distribution of disease categories.
The machine learning model needs training, so the augmented dataset split into training (70%) and testing (20%) and validation (10%) portions to help evaluation. The structured splitting technique enables effective generalization of models while ensuring the best results in multiple experimental testing conditions.
The dataset follows a system of three main directories:
1) Original Dataset: Contains raw images captured directly from the field.
2) Augmented Dataset: A separate section in the database features synthetic images that aid distribution balance.
3) Split Dataset: The Split Dataset holds pre-processed divisions of training data, testing data, and validation data that stem from the augmented dataset.
The wheat disease dataset provides researchers with essential resources to conduct investigations about wheat disease categorization as well as agricultural AI development and deep learning-based plant disease identification studies.
本数据集源自孟加拉国一处小麦种植田,涵盖5类独立的小麦叶片图像类别。数据集包含1603张分辨率为1920×1080像素的原始图像,划分为5类小麦病害类别:黑穗病(Black Point)303张、镰孢菌基腐病(Fusarium Foot Rot)250张、健康叶片(Healthy Leaf)250张、叶枯病(Leaf Blight)400张以及小麦瘟病(Wheat Blast)400张。
为实现数据集的类别均衡,研究人员在构建增强数据集前,针对每一类图像应用数据增强技术,额外生成1000张图像。数据增强完成后,该数据集总图像量达5000张,各类病害类别实现均匀分布。
为开展机器学习模型训练,研究人员将增强后的数据集划分为训练集(70%)、测试集(20%)与验证集(10%),用于模型性能评估。该结构化划分方法可有效提升模型泛化能力,并在多组实验测试场景下保障最优实验结果。
本数据集采用三级目录结构:
1) 原始数据集(Original Dataset):存储直接从田间采集的原始图像。
2) 增强数据集(Augmented Dataset):数据库的独立板块,包含用于优化数据分布的合成图像。
3) 划分后数据集(Split Dataset):存储从增强数据集预处理得到的训练数据、测试数据与验证数据划分结果。
该小麦病害数据集为研究人员开展小麦病害分类、农业人工智能(AI)开发以及基于深度学习的植物病害识别相关研究提供了重要的资源支撑。
创建时间:
2025-03-05



