TGTomato-Balanced: A Class-Balanced Derivative Dataset for Truss Tomato Detection
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/344hxx5rkf
下载链接
链接失效反馈官方服务:
资源简介:
TGTomato-Balanced is a class-balanced derivative dataset designed for truss tomato fruit detection and classification in precision agriculture applications. This dataset addresses critical class imbalance issues present in the original "2022 Dataset of String Tomato in Shanxi Nonggu Tomato Town" (Science Data Bank, CSTR:31253.11.sciencedb.05228) through targeted data augmentation and sampling strategies.
KEY FEATURES:
- Total of 4,505 images (3,665 original + augmented samples)
- Three categories: mature tomato, stem, and raw (green) tomato
- YOLO format annotations with bounding boxes
- High-resolution images: 2736×3648 to 3000×4000 pixels
- Balanced class distribution
- Split ratio: 8:1:1 (train/val/test)
MODIFICATIONS FROM ORIGINAL DATASET:
1. Data augmentation applied to minority class (raw tomatoes): random saturation, brightness, contrast adjustments, Gaussian noise, horizontal flipping, and Gaussian blurring
2. Downsampling of majority class (removed 302 images from mature-only subset)
3. Enhanced dataset distribution: Train (3603), Val (451), Test (451)
DATA COLLECTION (Original Dataset):
- Location: Gezitou Village, Taigu District, Shanxi Province, China
- Period: July-August 2022
- Conditions: Natural sunlight, sunny/overcast, morning to dusk
- Devices: iPhone 13 Pro Max, Huawei P30/Nova 5z, OPPO A91
- Viewing angles: 10°, 45°, 90°, 135°, 170°, plus top-down and forward/backward views
USE CASES:
- Object detection model training (YOLO, Faster R-CNN)
- Tomato maturity classification
- Agricultural robotics and automated harvesting
- Computer vision research in precision agriculture
- Class imbalance handling studies
LICENSE: CC BY 4.0 (same as original dataset)
CITATION REQUIREMENT: Users must cite BOTH this derivative dataset and the original dataset (Song Guo Zhu, SHI Yan, Wang Jian, et al., 2023).
This derivative work maintains full compliance with CC BY 4.0 license terms, providing appropriate attribution to the original dataset creators and clearly documenting all modifications made.
File structure includes: images/ (train/val/test), labels/ (train/val/test), classes.txt, and comprehensive README documentation.
创建时间:
2025-12-25



