five

3SeasonWeedDet10: a three-season, 10-class dataset for benchmarking AI models for robust weed detection

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14861515
下载链接
链接失效反馈
官方服务:
资源简介:
The 3SeasonWeedDet10 is a three-season dataset comprising 10 weed classes of images acquired in three consecutive years. The ten weed classes in the dataset, defined as common weed names, include Carpetweed, Eclipta, Goosegrass, Lambsquarters, Morningglory, Ragweed, Palmer Amaranth, Purslane, Spotted spurge, and Waterhemp. Although the dataset was specifically curated as a testbed for training ControlNet-added stable diffusion models for weed image generation and enhanced weed detection, it can be generally used for evaluating computer vision algorithms and artificial intelligence (AI) models for weed detection, especially in cross-season generation studies.     This dataset contains three subsets, i.e., data2021, data2022, and data2023 for the data collected in the years 2021, 2022, and 2023, respectively.   data2021: derived from our previous CottonWeedDet12 dataset, this subset has 4704 images with 7892 annotated weed instances (bounding boxes).    data2022: derived from our earlier two-season dataset 2SeasonWeedDet8 dataset, this subset has 1948 images with 3229 bounding boxes. Images in both 2021 and 2022 were acquired in the row crop field under natural light conditions using smartphone or digital color cameras on different farms of Mississippi State University. data2023: this is a new set of images acquired from different field sites of the Horticultural Research and Extension Center (Holt, MI) of Michigan State University. The subset contains 1748 images with 16,842 bounding box annotations. Clearly, more weed instances are present in the year 2023 images. Notably, although handheld devices were used for image collection, the majority of images were captured automatically using a motorized vision platform with dedicated software programs. Hence, the entire dataset consists of a total of 8463 images with 27,963 bounding boxes, totaling over 41 GB in file size.  All images were named in a consistent, self-explanatory fashion (including date, imaging device, and personnel information).    As in our prior dataset curation efforts, images were manually labeled by qualified personnel who draw bounding boxes for individual weed plants using the VGG Image Annotator (version 2.10). Initial annotations were examined by trained personnel for weed identification for quality control before inclusion in the final dataset. Each weed image (in .jpg format) has one corresponding annotation file of the same file name, in both .JSON and .XML formats, placed in the same subset folder.  For the .JSON file, the annotated bounding box is defined in COCO format, i.e., [x_min, y_min, width, height]. For the .XML file, the annotated bounding box is represented in Pascal VOC format, i.e., [x_min, y_min, x_max, y_max].    More information about the dataset and peformance evaluation of deep generative modeling for weed generation and detection is given in the accompanying journal paper:  Deng, B., Lu, Y., 2025. Weed Image Augmentation by ControlNet-Added Stable Diffusion for Multi-Class Weed Detection. Computers and Electronics in Agriculture, 110123. https://doi.org/10.1016/j.compag.2025.110123.  If you use the dataset in published research, please consider citing the dataset or associated journal article above. Hopefully, you find this dataset useful.
创建时间:
2025-02-13
二维码
社区交流群
二维码
科研交流群
商业服务