Synthetic Meets Authentic: Leveraging Text-to-Image Generated Datasets for Apple Detection in Orchard Environments

Name: Synthetic Meets Authentic: Leveraging Text-to-Image Generated Datasets for Apple Detection in Orchard Environments
Creator: Mendeley Data
License: 暂无描述

doi.org2025-01-15 收录

下载链接：

http://doi.org/10.17632/j739ptz54k.1

下载链接

链接失效反馈

官方服务：

资源简介：

Training machine learning (ML) models for computer vision-based object detection process typically requires large, labeled datasets, a process often burdened by significant human effort and high costs associated with imaging systems and image acquisition. This research aimed to simplify image data collection for object detection in orchards by avoiding traditional fieldwork with different imaging sensors. Utilizing OpenAI's DALLE, a large language model (LLM) for realistic image generation, we generated and annotated a cost effective dataset. This dataset, exclusively generated with text-to-image prompts/inputs, was then utilized to train a deep learning model, YOLOv8, for apple detection, which was then tested with real-world (outdoor orchard) images captured by a digital (Nikon D5100) camera as well as a machine vision camera (IntelRealsense D435i). The model achieved a training precision of 0.83, recall of 0.99, an F1 score of 0.92, and mAP@50 at 0.96. Validation tests against actual images collected over two different varieties of apples (Honeycrisp and Envy) in a commercial orchard environment showed a precision of 0.82 and 0.75, recall of 0.88 and 0.63, and mAP@50 of 0.92 and 0.70, each respectively. The inference time of the model was 0.015 seconds for the digital camera-based images and 0.012 seconds for the machine vision camera based images. This study presents a pathway for generating large image datasets in challenging agricultural fields with minimal or no labor-intensive efforts in field data-collection, which could accelerate the development and deployment of computer vision and robotic technologies in orchard environments.

训练基于计算机视觉的对象检测的机器学习（ML）模型，通常需要大量标记化的数据集，这一过程往往伴随着大量的人力和与成像系统及图像采集相关的高昂成本。本研究旨在通过避免使用传统的田野调查和多种成像传感器来简化果园中对象检测的图像数据收集。利用 OpenAI 的 DALLE，一种用于生成逼真图像的大语言模型（LLM），我们生成了一个成本效益高的数据集，并对该数据集进行了标注。该数据集通过文本到图像的提示/输入生成，随后被用于训练深度学习模型 YOLOv8 以进行苹果检测。该模型随后使用由数字相机（尼康 D5100）和机器视觉相机（IntelRealsense D435i）捕捉的真实世界（户外果园）图像进行了测试。该模型在训练过程中达到了精度 0.83、召回率 0.99、F1 分数 0.92 以及 mAP@50 达到 0.96 的性能。针对在商业果园环境中收集的两种不同品种苹果（Honeycrisp 和 Envy）的实际图像的验证测试结果显示，精度分别为 0.82 和 0.75，召回率分别为 0.88 和 0.63，mAP@50 分别为 0.92 和 0.70。该模型基于数字相机图像的推理时间为 0.015 秒，基于机器视觉相机图像的推理时间为 0.012 秒。本研究提出了一种在具有挑战性的农业领域生成大型图像数据集的方法，在田野数据收集过程中仅需少量或无需劳动密集型工作，这可以加速计算机视觉和机器人技术在果园环境中的开发和部署。

提供机构：

Mendeley Data

5,000+

优质数据集

54 个

任务类型

进入经典数据集