LifeScienceModel/VegAnn

Name: LifeScienceModel/VegAnn
Creator: LifeScienceModel
Published: 2024-02-02 13:08:57
License: 暂无描述

Hugging Face2024-02-02 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/LifeScienceModel/VegAnn

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: image dtype: image - name: mask dtype: image - name: System dtype: string - name: Orientation dtype: string - name: latitude dtype: float64 - name: longitude dtype: float64 - name: date dtype: string - name: LocAcc dtype: int64 - name: Species dtype: string - name: Owner dtype: string - name: Dataset-Name dtype: string - name: TVT-split1 dtype: string - name: TVT-split2 dtype: string - name: TVT-split3 dtype: string - name: TVT-split4 dtype: string - name: TVT-split5 dtype: string splits: - name: train num_bytes: 1896819757.9 num_examples: 3775 download_size: 1940313757 dataset_size: 1896819757.9 configs: - config_name: default data_files: - split: train path: data/train-* --- # VegAnn Dataset 😄 ## Dataset Description 📖 VegAnn, short for Vegetation Annotation, is a meticulously curated collection of 3,775 multi-crop RGB images aimed at enhancing research in crop vegetation segmentation. These images span various phenological stages and were captured using diverse systems and platforms under a wide range of illumination conditions. By aggregating sub-datasets from different projects and institutions, VegAnn represents a broad spectrum of measurement conditions, crop species, and development stages. ### Languages 🌐 The annotations and documentation are primarily in English. ## Dataset Structure 🏗 ### Data Instances 📸 A VegAnn data instance consists of a 512x512 pixel RGB image patch derived from larger raw images. These patches are designed to provide sufficient detail for distinguishing between vegetation and background, crucial for applications in semantic segmentation and other forms of computer vision analysis in agricultural contexts. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/645a05f09e55477fff862881/O-iKRqn8FRZnY9hBzmaU5.png) ### Data Fields 📋 - `Name`: Unique identifier for each image patch. - `System`: The imaging system used to acquire the photo (e.g., Handheld Cameras, DHP, UAV). - `Orientation`: The camera's orientation during image capture (e.g., Nadir, 45 degrees). - `latitude` and `longitude`: Geographic coordinates where the image was taken. - `date`: Date of image acquisition. - `LocAcc`: Location accuracy flag (1 for high accuracy, 0 for low or uncertain accuracy). - `Species`: The crop species featured in the image (e.g., Wheat, Maize, Soybean). - `Owner`: The institution or entity that provided the image (e.g., Arvalis, INRAe). - `Dataset-Name`: The sub-dataset or project from which the image originates (e.g., Phenomobile, Easypcc). - `TVT-split1` to `TVT-split5`: Fields indicating the train/validation/test split configurations, facilitating various experimental setups. ### Data Splits 📊 The dataset is structured into multiple splits (as indicated by `TVT-split` fields) to support different training, validation, and testing scenarios in machine learning workflows. ## Dataset Creation 🛠 ### Curation Rationale 🤔 The VegAnn dataset was developed to address the gap in available datasets for training convolutional neural networks (CNNs) for the task of semantic segmentation in real-world agricultural environments. By incorporating images from a wide array of conditions and stages of crop development, VegAnn aims to enhance the performance of segmentation algorithms, promote benchmarking, and foster research on large-scale crop vegetation segmentation. ### Source Data 🌱 #### Initial Data Collection and Normalization Images within VegAnn were sourced from various sub-datasets contributed by different institutions, each under specific acquisition configurations. These were then standardized into 512x512 pixel patches to maintain consistency across the dataset. #### Who are the source data providers? The data was provided by a collaboration of institutions including Arvalis, INRAe, The University of Tokyo, University of Queensland, NEON, and EOLAB, among others. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/645a05f09e55477fff862881/W7rF7P9oexd-Q7oBGV6aF.png) ### Annotations 📝 #### Annotation process Annotations for the dataset were focused on distinguishing between vegetation and background within the images. The process ensured that the images offered sufficient spatial resolution to allow for accurate visual segmentation. #### Who are the annotators? The annotations were performed by a team comprising researchers and domain experts from the contributing institutions. ## Considerations for Using the Data 🤓 ### Social Impact of Dataset 🌍 The VegAnn dataset is expected to significantly impact agricultural research and commercial applications by enhancing the accuracy of crop monitoring, disease detection, and yield estimation through improved vegetation segmentation techniques. ### Discussion of Biases 🧐 Given the diverse sources of the images, there may be inherent biases towards certain crop types, geographical locations, and imaging conditions. Users should consider this diversity in applications and analyses. ### Licensing Information 📄 Please refer to the specific licensing agreements of the contributing institutions or contact the dataset providers for more information on usage rights and restrictions. ## Citation Information 📚 If you use the VegAnn dataset in your research, please cite the following: ``` @article{madec_vegann_2023, title = {{VegAnn}, {Vegetation} {Annotation} of multi-crop {RGB} images acquired under diverse conditions for segmentation}, volume = {10}, issn = {2052-4463}, url = {https://doi.org/10.1038/s41597-023-02098-y}, doi = {10.1038/s41597-023-02098-y}, abstract = {Applying deep learning to images of cropping systems provides new knowledge and insights in research and commercial applications. Semantic segmentation or pixel-wise classification, of RGB images acquired at the ground level, into vegetation and background is a critical step in the estimation of several canopy traits. Current state of the art methodologies based on convolutional neural networks (CNNs) are trained on datasets acquired under controlled or indoor environments. These models are unable to generalize to real-world images and hence need to be fine-tuned using new labelled datasets. This motivated the creation of the VegAnn - Vegetation Annotation - dataset, a collection of 3775 multi-crop RGB images acquired for different phenological stages using different systems and platforms in diverse illumination conditions. We anticipate that VegAnn will help improving segmentation algorithm performances, facilitate benchmarking and promote large-scale crop vegetation segmentation research.}, number = {1}, journal = {Scientific Data}, author = {Madec, Simon and Irfan, Kamran and Velumani, Kaaviya and Baret, Frederic and David, Etienne and Daubige, Gaetan and Samatan, Lucas Bernigaud and Serouart, Mario and Smith, Daniel and James, Chrisbin and Camacho, Fernando and Guo, Wei and De Solan, Benoit and Chapman, Scott C. and Weiss, Marie}, month = may, year = {2023}, pages = {302}, } ``` ## Additional Information - **Dataset Curators**: Simon Madec et al. - **Version**: 1.0 - **License**: Specified by each contributing institution - **Contact**: TBD

数据集信息：特征： - 名称：image，数据类型：image（图像） - 名称：mask，数据类型：image（图像） - 名称：System（成像系统），数据类型：字符串 - 名称：Orientation（拍摄方位），数据类型：字符串 - 名称：latitude（纬度），数据类型：float64（64位浮点型） - 名称：longitude（经度），数据类型：float64（64位浮点型） - 名称：date（采集日期），数据类型：字符串 - 名称：LocAcc（位置精度标记），数据类型：int64（64位整型） - 名称：Species（作物物种），数据类型：字符串 - 名称：Owner（数据提供方），数据类型：字符串 - 名称：Dataset-Name（子数据集名称），数据类型：字符串 - 名称：TVT-split1（训练/验证/测试拆分1），数据类型：字符串 - 名称：TVT-split2（训练/验证/测试拆分2），数据类型：字符串 - 名称：TVT-split3（训练/验证/测试拆分3），数据类型：字符串 - 名称：TVT-split4（训练/验证/测试拆分4），数据类型：字符串 - 名称：TVT-split5（训练/验证/测试拆分5），数据类型：字符串拆分： - 名称：train（训练集），字节数：1896819757.9，样本数：3775 下载大小：1940313757 数据集总大小：1896819757.9 配置： - 配置名称：default（默认配置），数据文件： - 拆分：train（训练集），路径：data/train-* # VegAnn数据集 😄 ## 数据集描述 📖 VegAnn全称为Vegetation Annotation（植被标注），是经过精心整理的3775幅多作物RGB图像集合，旨在推动作物植被分割领域的研究进展。该数据集涵盖多种物候期，使用多样化的成像系统与平台，在多样的光照条件下采集完成。通过整合来自不同项目与机构的子数据集，VegAnn覆盖了广泛的采集条件、作物物种与发育阶段。 ### 语言 🌐 标注内容与文档说明主要采用英语撰写。 ## 数据集结构 🏗 ### 数据实例 📸 VegAnn的数据实例均源自原始大图裁剪得到的512×512像素RGB图像块。此类图像块能够提供足够的细节以区分植被与背景，对于农业场景下的语义分割（semantic segmentation）及其他计算机视觉分析应用至关重要。 !["image/png"](https://cdn-uploads.huggingface.co/production/uploads/645a05f09e55477fff862881/O-iKRqn8FRZnY9hBzmaU5.png) ### 数据字段 📋 - `"Name"`：每个图像块的唯一标识符。 - `"System"`：用于采集图像的成像系统（例如手持相机、DHP、无人机（UAV））。 - `"Orientation"`：图像采集时的相机方位（例如天底视角（Nadir）、45度角）。 - `"latitude"`（纬度）与`"longitude"`（经度）：图像采集地的地理坐标。 - `"date"`：图像采集日期。 - `"LocAcc"`：位置精度标记（1代表高精度，0代表低精度或精度不确定）。 - `"Species"`：图像中的作物物种（例如小麦、玉米、大豆）。 - `"Owner"`：提供该图像的机构或实体（例如Arvalis、INRAe）。 - `"Dataset-Name"`：该图像所属的子数据集或项目（例如Phenomobile、Easypcc）。 - `"TVT-split1"`至`"TVT-split5"`：用于标识训练/验证/测试拆分配置的字段，便于开展各类实验设置。 ### 数据拆分 📊 本数据集通过`"TVT-split"`字段设置了多种拆分方式，以适配机器学习流程中不同的训练、验证与测试场景。 ## 数据集构建 🛠 ### 整理依据 🤔 VegAnn数据集的开发旨在填补现有数据集的不足：当前针对真实农业环境语义分割任务训练卷积神经网络（CNNs）的可用数据集较为匮乏。通过整合多样采集条件与作物发育阶段的图像，VegAnn旨在提升分割算法的性能、推动基准测试工作，并促进大规模作物植被分割领域的研究。 ### 源数据 🌱 #### 初始数据采集与标准化 VegAnn的图像源自不同机构贡献的多个子数据集，各子数据集均采用特定的采集配置。随后所有图像均被标准化裁剪为512×512像素的图像块，以保证数据集内的一致性。 #### 源数据提供方有哪些？本数据集由多家机构合作提供，包括Arvalis、INRAe、东京大学、昆士兰大学、NEON以及EOLAB等。 !["image/png"](https://cdn-uploads.huggingface.co/production/uploads/645a05f09e55477fff862881/W7rF7P9oexd-Q7oBGV6aF.png) ### 标注信息 📝 #### 标注流程本数据集的标注工作聚焦于区分图像中的植被与背景。标注流程确保图像具备足够的空间分辨率，以实现精准的视觉分割。 #### 标注人员有哪些？标注工作由来自各贡献机构的研究人员与领域专家组成的团队完成。 ## 数据使用注意事项 🤓 ### 数据集的社会影响 🌍 VegAnn数据集有望通过提升植被分割技术的精度，显著推动农业研究与商业应用的发展，助力作物监测、病害检测与产量估算等任务。 ### 偏差分析 🧐 由于图像来源多样，数据集可能存在针对特定作物类型、地理区域与成像条件的固有偏差。使用者在应用与分析过程中应考虑到这一多样性。 ### 授权信息 📄 如需了解使用权限与限制的更多信息，请参阅各贡献机构的具体授权协议，或联系数据集提供方。 ## 引用信息 📚 如果您在研究中使用VegAnn数据集，请引用以下文献： @article{madec_vegann_2023, title = {{VegAnn}, {Vegetation} {Annotation} of multi-crop {RGB} images acquired under diverse conditions for segmentation}, volume = {10}, issn = {2052-4463}, url = {https://doi.org/10.1038/s41597-023-02098-y}, doi = {10.1038/s41597-023-02098-y}, abstract = {Applying deep learning to images of cropping systems provides new knowledge and insights in research and commercial applications. Semantic segmentation or pixel-wise classification, of RGB images acquired at the ground level, into vegetation and background is a critical step in the estimation of several canopy traits. Current state of the art methodologies based on convolutional neural networks (CNNs) are trained on datasets acquired under controlled or indoor environments. These models are unable to generalize to real-world images and hence need to be fine-tuned using new labelled datasets. This motivated the creation of the VegAnn - Vegetation Annotation - dataset, a collection of 3775 multi-crop RGB images acquired for different phenological stages using different systems and platforms in diverse illumination conditions. We anticipate that VegAnn will help improving segmentation algorithm performances, facilitate benchmarking and promote large-scale crop vegetation segmentation research.}, number = {1}, journal = {Scientific Data}, author = {Madec, Simon and Irfan, Kamran and Velumani, Kaaviya and Baret, Frederic and David, Etienne and Daubige, Gaetan and Samatan, Lucas Bernigaud and Serouart, Mario and Smith, Daniel and James, Chrisbin and Camacho, Fernando and Guo, Wei and De Solan, Benoit and Chapman, Scott C. and Weiss, Marie}, month = may, year = {2023}, pages = {302}, } ## 附加信息 - **数据集整理者**：Simon Madec 等 - **版本**：1.0 - **授权协议**：由各贡献机构分别指定 - **联系方式**：待定（TBD）

提供机构：

LifeScienceModel

原始信息汇总

VegAnn 数据集概述

数据集描述

VegAnn（Vegetation Annotation）是一个精心策划的多作物RGB图像集合，包含3,775张图像，旨在促进作物植被分割研究。这些图像涵盖了不同的物候阶段，并使用多种系统和平台在各种光照条件下拍摄。通过汇集来自不同项目和机构的子数据集，VegAnn代表了广泛的测量条件、作物种类和发展阶段。

语言

注释和文档主要使用英语。

数据集结构

数据实例

每个VegAnn数据实例包含一个从原始大图像中提取的512x512像素RGB图像块。这些图像块旨在提供足够的细节，以便在农业环境中进行语义分割和其他计算机视觉分析时区分植被和背景。

数据字段

Name：每个图像块的唯一标识符。
System：用于获取照片的成像系统（例如，手持相机、DHP、无人机）。
Orientation：拍摄图像时相机的方向（例如，正下方、45度）。
latitude 和 longitude：图像拍摄的地理坐标。
date：图像获取日期。
LocAcc：位置精度标志（1表示高精度，0表示低或不确定精度）。
Species：图像中展示的作物种类（例如，小麦、玉米、大豆）。
Owner：提供图像的机构或实体（例如，Arvalis、INRAe）。
Dataset-Name：图像来源的子数据集或项目（例如，Phenomobile、Easypcc）。
TVT-split1 到 TVT-split5：指示训练/验证/测试分割配置的字段，便于各种实验设置。

数据分割

数据集被划分为多个分割（由TVT-split字段指示），以支持机器学习工作流中的不同训练、验证和测试场景。

数据集创建

策划理由

VegAnn数据集的开发旨在填补现有数据集在训练卷积神经网络（CNNs）进行真实农业环境中的语义分割任务方面的空白。通过包含来自各种条件和作物发展阶段的图像，VegAnn旨在提高分割算法的性能，促进基准测试，并推动大规模作物植被分割研究。

源数据

初始数据收集和标准化

VegAnn中的图像来自不同机构提供的各种子数据集，每个子数据集在特定的采集配置下获取。这些图像随后被标准化为512x512像素的图像块，以保持数据集的一致性。

源数据提供者

数据由包括Arvalis、INRAe、东京大学、昆士兰大学、NEON和EOLAB等机构的合作提供。

注释

注释过程

数据集的注释集中在图像中区分植被和背景。注释过程确保图像提供足够的空间分辨率，以便进行准确的视觉分割。

注释者

注释由来自贡献机构的研究人员和领域专家组成的团队执行。

使用数据的注意事项

数据集的社会影响

VegAnn数据集预计将通过提高植被分割技术的准确性，显著影响农业研究和商业应用，如作物监测、疾病检测和产量估计。

偏见讨论

由于图像来源多样，可能存在对某些作物类型、地理位置和成像条件的固有偏见。用户在应用和分析时应考虑这种多样性。

许可信息

请参考贡献机构的特定许可协议，或联系数据集提供者以获取更多关于使用权利和限制的信息。

引用信息

如果您在研究中使用VegAnn数据集，请引用以下内容：

@article{madec_vegann_2023, title = {{VegAnn}, {Vegetation} {Annotation} of multi-crop {RGB} images acquired under diverse conditions for segmentation}, volume = {10}, issn = {2052-4463}, url = {https://doi.org/10.1038/s41597-023-02098-y}, doi = {10.1038/s41597-023-02098-y}, abstract = {Applying deep learning to images of cropping systems provides new knowledge and insights in research and commercial applications. Semantic segmentation or pixel-wise classification, of RGB images acquired at the ground level, into vegetation and background is a critical step in the estimation of several canopy traits. Current state of the art methodologies based on convolutional neural networks (CNNs) are trained on datasets acquired under controlled or indoor environments. These models are unable to generalize to real-world images and hence need to be fine-tuned using new labelled datasets. This motivated the creation of the VegAnn - Vegetation Annotation - dataset, a collection of 3775 multi-crop RGB images acquired for different phenological stages using different systems and platforms in diverse illumination conditions. We anticipate that VegAnn will help improving segmentation algorithm performances, facilitate benchmarking and promote large-scale crop vegetation segmentation research.}, number = {1}, journal = {Scientific Data}, author = {Madec, Simon and Irfan, Kamran and Velumani, Kaaviya and Baret, Frederic and David, Etienne and Daubige, Gaetan and Samatan, Lucas Bernigaud and Serouart, Mario and Smith, Daniel and James, Chrisbin and Camacho, Fernando and Guo, Wei and De Solan, Benoit and Chapman, Scott C. and Weiss, Marie}, month = may, year = {2023}, pages = {302}, }

附加信息

数据集策展人：Simon Madec 等人
版本：1.0
许可证：由每个贡献机构指定
联系：待定

搜集汇总

数据集介绍

构建方式

VegAnn数据集的构建基于多源数据整合，汇集了来自不同机构和项目的3,775张多作物RGB图像，涵盖了多种作物在不同生长阶段和光照条件下的图像。这些图像经过标准化处理，被裁剪为512x512像素的图像块，以确保在语义分割任务中提供足够的细节。数据集的构建旨在填补农业环境中用于训练卷积神经网络（CNN）进行语义分割的数据集的空白，通过多样化的图像来源和条件，提升分割算法的性能和泛化能力。

特点

VegAnn数据集的显著特点在于其多样性和广泛性。数据集包含了来自不同机构、不同作物种类、不同生长阶段和不同光照条件下的图像，确保了数据的多维度覆盖。此外，数据集提供了详细的元数据，包括图像的地理位置、拍摄日期、拍摄系统等信息，这些信息为研究者提供了丰富的上下文，有助于更精确的分析和模型训练。

使用方法

VegAnn数据集适用于多种机器学习任务，特别是作物植被的语义分割。用户可以通过提供的训练、验证和测试分割配置（TVT-split）进行模型训练和评估。数据集的图像和标注可以直接用于训练卷积神经网络（CNN）或其他深度学习模型。为了确保数据的合法使用，用户应遵守各贡献机构提供的许可协议，并在研究中引用相关文献。

背景与挑战

背景概述

VegAnn数据集，全称为Vegetation Annotation，是由Simon Madec等人于2023年精心构建的一个专注于作物植被分割研究的图像数据集。该数据集汇集了来自多个研究机构和项目的3,775张多作物RGB图像，涵盖了不同的作物生长阶段和多样化的光照条件。通过整合来自Arvalis、INRAe、东京大学、昆士兰大学等机构的子数据集，VegAnn旨在填补现有数据集在真实农业环境中训练卷积神经网络（CNN）进行语义分割的空白。该数据集的创建不仅提升了分割算法的性能，还促进了大规模作物植被分割研究的基准测试和推广。

当前挑战

VegAnn数据集在构建过程中面临多项挑战。首先，由于图像来源于不同的采集系统和平台，确保数据的标准化和一致性是一个重要挑战。其次，图像在不同的光照条件和作物生长阶段下拍摄，这增加了分割任务的复杂性。此外，数据集的多样性虽然增强了其应用的广泛性，但也可能导致某些作物类型或地理区域的偏倚。最后，如何有效地标注和区分植被与背景，确保标注的准确性和一致性，也是该数据集面临的一个重要挑战。

常用场景

经典使用场景

VegAnn数据集的经典使用场景主要集中在农业领域的作物植被分割任务中。通过提供多光谱RGB图像及其对应的植被标注，该数据集为研究人员和开发者提供了丰富的资源，用于训练和验证卷积神经网络（CNN）在不同光照条件、作物生长阶段和地理环境下的表现。这些图像的多样性使得VegAnn成为开发和测试植被分割算法的首选数据集，尤其适用于需要高精度分割的场景，如作物健康监测和产量预测。

衍生相关工作

基于VegAnn数据集，许多相关研究工作得以展开。例如，研究人员利用该数据集开发了新的植被分割模型，这些模型在不同作物和环境条件下表现出色。此外，VegAnn还促进了多光谱图像处理技术的研究，推动了农业遥感技术的发展。一些研究团队还基于VegAnn数据集进行了跨领域的应用探索，如将植被分割技术应用于生态系统监测和城市绿化评估。这些衍生工作进一步扩展了VegAnn的影响力，推动了农业和环境科学领域的技术进步。

数据集最近研究