simonMadec/VegAnn

Name: simonMadec/VegAnn
Creator: simonMadec
Published: 2024-02-10 08:59:52
License: 暂无描述

Hugging Face2024-02-10 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/simonMadec/VegAnn

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en size_categories: - 1K<n<10K task_categories: - image-segmentation tags: - vegetation - segmentation DOI: - 10.1038/s41597-023-02098-y licence: - CC-BY dataset_info: features: - name: image dtype: image - name: mask dtype: image - name: System dtype: string - name: Orientation dtype: string - name: latitude dtype: float64 - name: longitude dtype: float64 - name: date dtype: string - name: LocAcc dtype: int64 - name: Species dtype: string - name: Owner dtype: string - name: Dataset-Name dtype: string - name: TVT-split1 dtype: string - name: TVT-split2 dtype: string - name: TVT-split3 dtype: string - name: TVT-split4 dtype: string - name: TVT-split5 dtype: string splits: - name: train num_bytes: 1896819757.9 num_examples: 3775 download_size: 1940313757 dataset_size: 1896819757.9 configs: - config_name: default data_files: - split: train path: data/train-* --- # VegAnn Dataset ### **Vegetation Annotation of a large multi-crop RGB Dataset acquired under diverse conditions for image semantic segmentation** ## Keypoints ⏳ - VegAnn contains 3775 images - Images are 512*512 pixels - Corresponding binary masks is 0 for soil + crop residues (background) 255 for Vegetation (foreground) - The dataset includes images of 26+ crop species, which are not evenly represented - VegAnn was compiled using a variety of outdoor images captured with different acquisition systems and configurations - For more information about VegAnn, details, labeling rules and potential uses see https://doi.org/10.1038/s41597-023-02098-y ## Dataset Description 📚 VegAnn, short for Vegetation Annotation, is a meticulously curated collection of 3,775 multi-crop RGB images aimed at enhancing research in crop vegetation segmentation. These images span various phenological stages and were captured using diverse systems and platforms under a wide range of illumination conditions. By aggregating sub-datasets from different projects and institutions, VegAnn represents a broad spectrum of measurement conditions, crop species, and development stages. ### Languages 🌐 The annotations and documentation are primarily in English. ## Dataset Structure 🏗 ### Data Instances 📸 A VegAnn data instance consists of a 512x512 pixel RGB image patch derived from larger raw images. These patches are designed to provide sufficient detail for distinguishing between vegetation and background, crucial for applications in semantic segmentation and other forms of computer vision analysis in agricultural contexts. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/645a05f09e55477fff862881/O-iKRqn8FRZnY9hBzmaU5.png) ### Data Fields 📋 - `Name`: Unique identifier for each image patch. - `System`: The imaging system used to acquire the photo (e.g., Handheld Cameras, DHP, UAV). - `Orientation`: The camera's orientation during image capture (e.g., Nadir, 45 degrees). - `latitude` and `longitude`: Geographic coordinates where the image was taken. - `date`: Date of image acquisition. - `LocAcc`: Location accuracy flag (1 for high accuracy, 0 for low or uncertain accuracy). - `Species`: The crop species featured in the image (e.g., Wheat, Maize, Soybean). - `Owner`: The institution or entity that provided the image (e.g., Arvalis, INRAe). - `Dataset-Name`: The sub-dataset or project from which the image originates (e.g., Phenomobile, Easypcc). - `TVT-split1` to `TVT-split5`: Fields indicating the train/validation/test split configurations, facilitating various experimental setups. ### Data Splits 📊 The dataset is structured into multiple splits (as indicated by `TVT-split` fields) to support different training, validation, and testing scenarios in machine learning workflows. ## Dataset Creation 🛠 ### Curation Rationale 🤔 The VegAnn dataset was developed to address the gap in available datasets for training convolutional neural networks (CNNs) for the task of semantic segmentation in real-world agricultural environments. By incorporating images from a wide array of conditions and stages of crop development, VegAnn aims to enhance the performance of segmentation algorithms, promote benchmarking, and foster research on large-scale crop vegetation segmentation. ### Source Data 🌱 #### Initial Data Collection and Normalization Images within VegAnn were sourced from various sub-datasets contributed by different institutions, each under specific acquisition configurations. These were then standardized into 512x512 pixel patches to maintain consistency across the dataset. #### Who are the source data providers? The data was provided by a collaboration of institutions including Arvalis, INRAe, The University of Tokyo, University of Queensland, NEON, and EOLAB, among others. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/645a05f09e55477fff862881/W7rF7P9oexd-Q7oBGV6aF.png) ### Annotations 📝 #### Annotation process Annotations for the dataset were focused on distinguishing between vegetation and background within the images. The process ensured that the images offered sufficient spatial resolution to allow for accurate visual segmentation. #### Who are the annotators? The annotations were performed by a team comprising researchers and domain experts from the contributing institutions. ## Considerations for Using the Data 🤓 ### Social Impact of Dataset 🌍 The VegAnn dataset is expected to significantly impact agricultural research and commercial applications by enhancing the accuracy of crop monitoring, disease detection, and yield estimation through improved vegetation segmentation techniques. ### Discussion of Biases 🧐 Given the diverse sources of the images, there may be inherent biases towards certain crop types, geographical locations, and imaging conditions. Users should consider this diversity in applications and analyses. ### Licensing Information 📄 Please refer to the specific licensing agreements of the contributing institutions or contact the dataset providers for more information on usage rights and restrictions. ## Citation Information 📚 If you use the VegAnn dataset in your research, please cite the following: ``` @article{madec_vegann_2023, title = {{VegAnn}, {Vegetation} {Annotation} of multi-crop {RGB} images acquired under diverse conditions for segmentation}, volume = {10}, issn = {2052-4463}, url = {https://doi.org/10.1038/s41597-023-02098-y}, doi = {10.1038/s41597-023-02098-y}, abstract = {Applying deep learning to images of cropping systems provides new knowledge and insights in research and commercial applications. Semantic segmentation or pixel-wise classification, of RGB images acquired at the ground level, into vegetation and background is a critical step in the estimation of several canopy traits. Current state of the art methodologies based on convolutional neural networks (CNNs) are trained on datasets acquired under controlled or indoor environments. These models are unable to generalize to real-world images and hence need to be fine-tuned using new labelled datasets. This motivated the creation of the VegAnn - Vegetation Annotation - dataset, a collection of 3775 multi-crop RGB images acquired for different phenological stages using different systems and platforms in diverse illumination conditions. We anticipate that VegAnn will help improving segmentation algorithm performances, facilitate benchmarking and promote large-scale crop vegetation segmentation research.}, number = {1}, journal = {Scientific Data}, author = {Madec, Simon and Irfan, Kamran and Velumani, Kaaviya and Baret, Frederic and David, Etienne and Daubige, Gaetan and Samatan, Lucas Bernigaud and Serouart, Mario and Smith, Daniel and James, Chrisbin and Camacho, Fernando and Guo, Wei and De Solan, Benoit and Chapman, Scott C. and Weiss, Marie}, month = may, year = {2023}, pages = {302}, } ``` ## Additional Information - **Dataset Curators**: Simon Madec et al. - **Version**: 1.0 - **License**: CC-BY - **Contact**: simon.madec@cirad.fr

提供机构：

simonMadec

原始信息汇总

VegAnn 数据集概述

数据集描述

VegAnn（Vegetation Annotation）是一个精心策划的多作物RGB图像集合，旨在促进作物植被分割的研究。该数据集包含3,775张图像，涵盖了各种物候阶段，并使用多种系统和平台在广泛的照明条件下捕获。通过汇集来自不同项目和机构的子数据集，VegAnn代表了广泛的测量条件、作物种类和发展阶段。

关键点

包含3,775张图像
图像分辨率为512x512像素
对应的二值掩码：0表示土壤和作物残留物（背景），255表示植被（前景）
数据集包括26种以上的作物，分布不均
图像由不同的采集系统和配置在户外捕获

数据结构

数据实例

每个VegAnn数据实例由一个512x512像素的RGB图像块组成，这些图像块是从更大的原始图像中提取的，旨在提供足够的细节以区分植被和背景，这对于农业环境中的语义分割和其他计算机视觉分析至关重要。

数据字段

Name：每个图像块的唯一标识符
System：用于获取照片的成像系统（例如，手持相机、DHP、UAV）
Orientation：图像捕获时相机的方向（例如，正下方、45度）
latitude 和 longitude：图像拍摄的地理坐标
date：图像获取日期
LocAcc：位置精度标志（1表示高精度，0表示低或不确定精度）
Species：图像中的作物种类（例如，小麦、玉米、大豆）
Owner：提供图像的机构或实体（例如，Arvalis、INRAe）
Dataset-Name：图像来源的子数据集或项目（例如，Phenomobile、Easypcc）
TVT-split1 到 TVT-split5：指示训练/验证/测试分割配置的字段，便于各种实验设置

数据分割

数据集被划分为多个分割（由TVT-split字段指示），以支持机器学习工作流中的不同训练、验证和测试场景。

数据集创建

策划理由

VegAnn数据集的开发旨在填补现有数据集在训练卷积神经网络（CNNs）进行真实农业环境中的语义分割任务方面的空白。通过包含来自广泛条件和作物发展阶段的图像，VegAnn旨在提高分割算法的性能，促进基准测试，并推动大规模作物植被分割研究。

源数据

初始数据收集和标准化

VegAnn中的图像来自不同机构贡献的多个子数据集，每个子数据集在特定的采集配置下捕获。这些图像随后被标准化为512x512像素的图像块，以保持数据集的一致性。

源数据提供者

数据由包括Arvalis、INRAe、东京大学、昆士兰大学、NEON和EOLAB等机构的合作提供。

标注

标注过程

数据集的标注集中在区分图像中的植被和背景。标注过程确保图像提供足够的空间分辨率，以便进行准确的视觉分割。

标注者

标注工作由来自贡献机构的研究人员和领域专家组成的团队执行。

使用数据集的注意事项

数据集的社会影响

VegAnn数据集预计将显著影响农业研究和商业应用，通过改进植被分割技术提高作物监测、疾病检测和产量估计的准确性。

讨论偏差

鉴于图像来源的多样性，可能存在对某些作物类型、地理位置和成像条件的固有偏差。用户在应用和分析时应考虑这种多样性。

许可信息

请参考贡献机构的特定许可协议，或联系数据集提供者以获取更多关于使用权利和限制的信息。

引用信息

如果您在研究中使用VegAnn数据集，请引用以下内容：

@article{madec_vegann_2023, title = {{VegAnn}, {Vegetation} {Annotation} of multi-crop {RGB} images acquired under diverse conditions for segmentation}, volume = {10}, issn = {2052-4463}, url = {https://doi.org/10.1038/s41597-023-02098-y}, doi = {10.1038/s41597-023-02098-y}, abstract = {Applying deep learning to images of cropping systems provides new knowledge and insights in research and commercial applications. Semantic segmentation or pixel-wise classification, of RGB images acquired at the ground level, into vegetation and background is a critical step in the estimation of several canopy traits. Current state of the art methodologies based on convolutional neural networks (CNNs) are trained on datasets acquired under controlled or indoor environments. These models are unable to generalize to real-world images and hence need to be fine-tuned using new labelled datasets. This motivated the creation of the VegAnn - Vegetation Annotation - dataset, a collection of 3775 multi-crop RGB images acquired for different phenological stages using different systems and platforms in diverse illumination conditions. We anticipate that VegAnn will help improving segmentation algorithm performances, facilitate benchmarking and promote large-scale crop vegetation segmentation research.}, number = {1}, journal = {Scientific Data}, author = {Madec, Simon and Irfan, Kamran and Velumani, Kaaviya and Baret, Frederic and David, Etienne and Daubige, Gaetan and Samatan, Lucas Bernigaud and Serouart, Mario and Smith, Daniel and James, Chrisbin and Camacho, Fernando and Guo, Wei and De Solan, Benoit and Chapman, Scott C. and Weiss, Marie}, month = may, year = {2023}, pages = {302}, }

附加信息

数据集策展人：Simon Madec 等人
版本：1.0
许可证：CC-BY
联系：simon.madec@cirad.fr

搜集汇总

数据集介绍

构建方式

在农业计算机视觉领域，VegAnn数据集的构建体现了多源数据融合的先进理念。该数据集通过整合来自多个研究机构与项目的子数据集，涵盖了26种以上作物在不同物候阶段、多样化光照条件及多种成像系统下的图像。原始图像经过标准化处理，统一裁剪为512x512像素的RGB图像块，并配以区分植被与背景的二进制掩码。标注工作由各参与机构的研究人员与领域专家协作完成，确保了标注质量与一致性，从而为语义分割任务提供了高质量的训练与评估资源。

特点

VegAnn数据集的核心特征在于其丰富的多样性与详尽的元数据标注。该数据集包含3775张图像，覆盖了从手持相机到无人机等多种采集系统，以及从垂直俯瞰到倾斜角度等多种拍摄方位。每张图像均附带有地理坐标、采集日期、作物种类、数据提供机构等十余项元数据字段，并提供了五组不同的训练-验证-测试划分方案，支持灵活的机器学习实验设计。这种多维度的信息结构使得数据集不仅适用于基础的植被分割，还能服务于地理空间分析、作物生长建模等跨学科研究。

使用方法

利用VegAnn数据集进行科学研究时，研究者可依据其结构化的元数据字段进行有针对性的数据筛选与子集构建。数据集以标准图像分割任务格式组织，图像与掩码可直接用于训练卷积神经网络模型。通过TVT-split1至TVT-split5字段，用户能够采用不同的数据划分策略进行模型训练与验证，以评估算法在多样化农业场景下的泛化能力。该数据集兼容主流深度学习框架，并可通过HuggingFace平台便捷加载，为农业图像分析领域的算法开发与性能基准测试提供了高效支撑。

背景与挑战

背景概述

在农业信息学与计算机视觉交叉领域，精准的植被分割是估算作物冠层性状的关键技术。为应对现有数据集多源于受控环境、难以泛化至真实农田场景的局限，由Simon Madec等学者联合Arvalis、INRAe、东京大学等多所机构于2023年共同构建了VegAnn数据集。该数据集汇集了3775幅多作物RGB图像，涵盖26种以上作物种类、不同物候阶段及多样化的采集系统与光照条件，旨在为卷积神经网络提供具有广泛代表性的训练资源，推动大规模作物植被分割研究的算法性能提升与基准测试标准化。

当前挑战

VegAnn数据集致力于解决真实农业环境中植被语义分割的泛化性难题，其核心挑战在于如何克服田间图像因作物种类、生长阶段、采集视角及光照条件的高度异质性所导致的分割模型性能波动。在构建过程中，研究团队面临多重困难：需整合来自不同项目与机构的异构图像数据，并进行空间分辨率与标注规则的标准化；同时，数据中作物物种分布不均衡、地理与采集条件偏差可能引入潜在偏见，要求使用者谨慎评估模型在不同农业场景下的适用性。

常用场景

经典使用场景

在农业计算机视觉领域，VegAnn数据集为植被语义分割任务提供了关键支持。该数据集汇集了来自不同作物种类、生长阶段和成像条件的RGB图像，其经典应用场景在于训练和评估卷积神经网络模型，以实现对农田图像中植被与背景的精确像素级分类。通过提供多样化的真实世界图像样本，VegAnn使得模型能够学习并适应复杂的户外光照变化、相机角度差异以及作物形态多样性，从而显著提升分割算法在多变农业环境中的泛化能力和鲁棒性。

实际应用

在实际农业生产与管理中，VegAnn数据集支撑的技术已展现出广泛的应用潜力。基于该数据集训练的植被分割模型，可集成至无人机、地面移动平台或手持设备中，实现对大田作物覆盖度的自动监测、生物量估算以及早期胁迫检测。这些应用有助于优化灌溉决策、精准施肥和病虫害管理，从而提升资源利用效率与作物产量。此外，该数据集也为开发面向小农户的轻量级移动端农业诊断工具提供了数据基础，推动了智慧农业技术的普惠化落地。

衍生相关工作

自VegAnn发布以来，已催生了一系列围绕农业图像分割的经典研究工作。例如，研究者利用该数据集对U-Net、DeepLab等主流分割架构进行适应性改进，提出了针对作物边缘模糊和阴影干扰的增强模型。同时，一些工作专注于解决数据集中类别不均衡和领域差异问题，开发了基于迁移学习或域自适应的方法，以提升模型在未见作物或新环境下的表现。这些衍生研究不仅深化了对农业视觉任务的理解，也为构建更稳健、可扩展的农田智能感知系统奠定了算法基础。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集