paren8esis/S4A
收藏Hugging Face2023-10-24 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/paren8esis/S4A
下载链接
链接失效反馈官方服务:
资源简介:
---
YAML tags:
---
## Dataset Description
- **Homepage:** [www.sen4agrinet.space.noa.gr](https://www.sen4agrinet.space.noa.gr/)
- **Repository:** [github.com/Orion-AI-Lab/S4A](https://github.com/Orion-AI-Lab/S4A)
- **Paper:** ["A Sentinel-2 multi-year, multi-country benchmark dataset for crop classification and segmentation with deep learning" (D. Sykas, M. Sdraka, D. Zografakis, I. Papoutsis](https://arxiv.org/abs/2204.00951)
### Dataset Summary
Sen4AgriNet is a Sentinel-2 based time series multi-country benchmark dataset, tailored for agricultural monitoring applications with Machine and Deep Learning. It is annotated from farmer declarations collected via the Land Parcel Identification System (LPIS) for harmonizing country wide labels. These declarations have only recently been made available as open data, allowing for the first time the labelling of satellite imagery from ground truth data. We proceed to propose and standardise a new crop type taxonomy across Europe that address Common Agriculture Policy (CAP) needs, based on the Food and Agriculture Organization (FAO) Indicative Crop Classification scheme. Sen4AgriNet is the only multi-country, multi-year dataset that includes all spectral information. The current version covers the period 2019-2020 for Catalonia and France, while it can be extended to include additional countries.
### Languages
All information in the dataset is in English (`en_GB`).
## Dataset Structure
### Data Instances
A typical sample in Sen4AgriNet consists of the following fields:
```
{
'patch_full_name': '2019_31TCF_patch_10_14',
'patch_year': '2019',
'patch_name': 'patch_10_14',
'patch_country_code': 'ES',
'patch_tile': '31TCF',
'B01': array([...]),
'B02': array([...]),
'B03': array([...]),
'B04': array([...]),
'B05': array([...]),
'B06': array([...]),
'B07': array([...]),
'B08': array([...]),
'B09': array([...]),
'B10': array([...]),
'B11': array([...]),
'B12': array([...]),
'B8A': array([...]),
'parcels': array([...]),
'labels': array([...]),
'timestamp': [...]
}
```
### Data Fields
Below we provide a brief explanation of each field:
- `patch_full_name`: The full name of the patch.
- `patch_year`: The year of the observations included in the patch.
- `patch_name`: The name of the patch. It is of the form: `patch_xx_yy` where `xx` and `yy` are the indices of the patch inside the tile.
- `patch_country_code`: The country code of the observations included in the patch. Currently it is either `ES` for Catalonia or `FR` for France.
- `B01`, ..., `B8A`: Each one is an array containing the observations of the corresponding Sentinel-2 band. The shape of each array is (T, H, W) where T is the number of observations, H the height of the image and W the width of the image.
- `parcels`: A mask containing the parcels code number.
- `labels`: A mask containing the class codes for each crop in the taxonomy.
- `timestamp`: The timestamps of the observations.
### Data Splits
In this version of the dataset there are no predefined train/val/test splits so that the users can define their own.
### Data configurations
There are the following configurations in the current version of Sen4AgriNet:
- `complete`: The complete Sen4AgriNet dataset.
- `cat_2019`: Only Catalonia data for 2019.
- `cat_2020`: Only Catalonia data for 2020.
- `fr_2019`: Only France data for 2019.
## Dataset Creation
### Curation Rationale
One of the major problems faced by researchers in the fields of Remote Sensing and AI is the absence of country-wide labelled data that are harmonized along space and time. Specifically in the EU, the Common Agriculture Policy (CAP) has placed a stepping stone to overcome this issue by legally establishing Paying Agencies in each EU country which are responsible for distributing subsidies to farmers. In order to fulfill their objectives, Paying Agencies systematically collect the cultivated crop type and parcel geometries for every farmer and record it via the Land Parcel Identification System (LPIS) in a standardized way for each country. Unfortunately, public access to these farmer declarations has been restricted for several years, thus making it almost impossible to get country-wide ground truth data. However, since 2019 and for the
first time these datasets are gradually becoming open (e.g. France, Catalonia, Estonia, Croatia, Slovenia, Slovakia and Luxemburg). This change offers a significant opportunity for the Earth Observation (EO) community to explore novel and innovative data-driven agricultural applications, by exploiting this abundance of new LPIS information.
In principle, this fusion of the LPIS data sources has tremendous potential but there are still some barriers to overcome. First of all, the LPIS system of each country is customly configured to utilize the local language of the crop types and the specific taxonomy structure of the crops that matches the local subsidies policy implementation. This non-standardization of the labels prohibits the spatial generalization of Deep Learning (DL) models and thus needs to be carefully handled to achieve a common representation consistent among countries. On top of these contextual/semantic barriers, parcels are mapped in the corresponding national cartographic projection which in all cases is different from the cartographic projection of the satellite images and pose an additional challenge on the preparation of a consistent, proper and at scale DL-ready dataset.
Aiming to overcome the above limitations in this repository we offer Sen4AgriNet, a unique benchmark EO dataset for agricultural monitoring with the following key characteristics:
- it is **pixel based** to capture spatial parcel variability
- it is **multi-temporal** to capture the crop phenology phases
- it is **multi-annual** to model the seasonal variability
- it is **multi-country** to model the geographic spatial variability
- it is **object-aggregated** to further incorporate ground truth data (parcel geometries) in the process
- it is **modular** since it can be enlarged with parcels from more EU countries or expanded in a straightforward way to include additional sensor and non-EO data (e.g. meteorological data)
### Source Data
1) The LPIS data for the region of Catalonia for 2019–2020 provided by the "Agricultura, Ramaderia, Pesca i Alimentacio" with an Open Data Commons Attribution License.
2) France LPIS data for 2019 provided by the French Paying Agency with an Open Data Commons Attribution License.
3) All Sentinel-2 L1C images with less than 10% cloud coverage for the above tiles.
#### Initial Data Collection and Normalization
The Sentinel-2 L1C images were downloaded from Copernicus and each image was split into 900 non-overlapping patches. A single patch contains 366x366 images for the 10-meter bands, 183x183 for the 20-meter bands and 61x61 for the 60-meter bands. The size of the patches was chosen in order to have integer division of the size of the tile with all 3 different spatial resolutions of Sentinel-2.
#### Annotation process
The Indicative Crop Classification (ICC) scheme was developed by the United Nations FAO organization. It is an approach to produce a harmonized vocabulary and taxonomy for crops and plants that are used in food production. Sen4AgriNet adopts and customises an extended version of FAO ICC in order to create a universally applicable crop label nomenclature for the collected LPIS data with the following benefits:
- Single language (English) is used and naming for all classes across all participating countries.
- Classes are normalized among different datasets.
- Hierarchical class structure is adopted. Depending on the application different levels of classes can be used.
- Additional non-agricultural classes are used (e.g. "fallow land", "barren land", etc.) to model Remote Sensing spectral signatures since agricultural parcels co-exist with other unrelated classes in satellite images.
The presented custom FAO/CLC classification scheme has a total of 9 groups, 168 classes and sub-classes. The 161 classes/sub-classes are crop related, 4 are some major CLC classes (as sub-classes in this hierarchy), 2 are the fallow and barren lands, and 1 is the no data sub-class.
This crop taxonomy was used to create the `labels` mask. In addition, a second annotation mask is provided (`parcels`) where each parcel obtains a unique identifier, regardless of the crops cultivated in it.
### Personal and Sensitive Information
None.
## Considerations for Using the Data
### Social Impact of Dataset
We believe that Sen4AgriNet can be regarded as a labelled benchmark dataset, tailored for CAP and the use of Sentinel-2 imagery that come at no cost, and can spur numerous DL-based applications for crop type classification, parcel extraction, parcel counting and semantic segmentation. More importantly, the dataset can be extended to include other input data sources, including Sentinel-1 Synthetic Aperture Radar data, and meteorological data, allowing a new family of applications on early warning risk assessment and agricultural insurance.
## Additional Information
### Licensing Information
MIT License.
### Citation Information
```
@ARTICLE{
9749916,
author={Sykas, Dimitrios and Sdraka, Maria and Zografakis, Dimitrios and Papoutsis, Ioannis},
journal={IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing},
title={A Sentinel-2 multi-year, multi-country benchmark dataset for crop classification and segmentation with deep learning},
year={2022},
doi={10.1109/JSTARS.2022.3164771}
}
```
提供机构:
paren8esis
原始信息汇总
数据集概述
名称: Sen4AgriNet
类型: 基于Sentinel-2的时间序列多国基准数据集
用途: 农业监测应用,特别是机器学习和深度学习领域
覆盖范围: 2019-2020年,涵盖Catalonia和法国
数据结构:
- 样本组成: 每个样本包含全名、年份、名称、国家代码、瓦片标识以及多个Sentinel-2波段的数据、地块信息、标签和时间戳。
- 波段信息: 包括B01至B8A共13个波段,每个波段的数据形状为(T, H, W),其中T是观测次数,H和W分别是图像的高度和宽度。
- 标签系统: 采用FAO的Indicative Crop Classification (ICC)方案,共有9个组,168个类和子类。
数据配置:
complete: 完整数据集cat_2019: 仅包含2019年Catalonia数据cat_2020: 仅包含2020年Catalonia数据fr_2019: 仅包含2019年法国数据
语言: 所有数据信息均为英语 (en_GB)
许可证: MIT License
引用信息:
@ARTICLE{ 9749916, author={Sykas, Dimitrios and Sdraka, Maria and Zografakis, Dimitrios and Papoutsis, Ioannis}, journal={IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing}, title={A Sentinel-2 multi-year, multi-country benchmark dataset for crop classification and segmentation with deep learning}, year={2022}, doi={10.1109/JSTARS.2022.3164771} }
搜集汇总
数据集介绍

构建方式
Sen4AgriNet数据集的构建根植于欧洲共同农业政策(CAP)框架下土地地块识别系统(LPIS)的开放数据浪潮。研究团队系统收集了2019至2020年间加泰罗尼亚与法国的农田申报数据,并融合了哥白尼计划中Sentinel-2卫星的L1C影像(云覆盖率低于10%)。在数据预处理阶段,每幅卫星影像被分割为900个非重叠图块,每个图块根据波段空间分辨率(10米、20米、60米)分别生成366×366、183×183、61×61像素的阵列。为统一多国标签体系,团队基于联合国粮农组织(FAO)的指示性作物分类(ICC)方案,定制了包含9大组、168个类别及子类别的层级化作物分类法,从而生成像素级的标签掩膜与地块标识掩膜,最终构建出多时相、多年度、多国家的标准化遥感基准数据集。
使用方法
Sen4AgriNet数据集以HuggingFace数据集格式发布,用户可通过加载器直接调用。数据集提供四种配置选项:完整数据集(complete)、加泰罗尼亚2019年子集(cat_2019)、加泰罗尼亚2020年子集(cat_2020)及法国2019年子集(fr_2019),以适应不同时空尺度的研究需求。每个数据实例包含多光谱波段张量(形状为T×H×W)、时间戳列表、地块标识掩膜及作物分类标签掩膜。由于数据集未预设训练/验证/测试划分,用户可依据任务目标自定义分割策略,适用于作物类型分类、地块分割、物候期识别等深度学习任务。建议在模型训练前对光谱数据进行归一化处理,并利用时间序列特性设计时序注意力或循环神经网络架构,以充分挖掘多时相信息中的农业生态模式。
背景与挑战
背景概述
Sen4AgriNet(S4A)是由希腊国家天文台Orion AI实验室的Dimitrios Sykas、Maria Sdraka等研究人员于2022年创建的多年度、多国家农业监测基准数据集。该数据集融合了2019至2020年间加泰罗尼亚和法国的哨兵二号卫星影像与农民申报数据(LPIS),旨在解决欧洲共同农业政策(CAP)框架下农业遥感应用中标注数据匮乏与跨区域标签不一致的核心问题。通过采用联合国粮农组织(FAO)的指示性作物分类(ICC)方案,研究团队标准化了覆盖9大类、168个子类的作物分类体系,为深度学习驱动的作物分类、语义分割及物候监测提供了首个包含完整光谱信息的像素级、多时相、多国家标注资源。该数据集的开放发布显著推动了遥感与人工智能在精准农业、粮食安全监测等领域的交叉研究,成为欧盟农业数字化进程中的重要里程碑。
当前挑战
Sen4AgriNet面临的挑战涵盖领域问题与构建过程两方面。在领域层面,其核心挑战在于克服欧洲各国LPIS系统因语言、投影坐标及补贴政策差异导致的标签非标准化问题,这严重限制了深度学习模型的空间泛化能力;同时,如何利用多时相、多光谱哨兵二号数据精准捕捉作物物候期与年际变异,以应对复杂农业场景中的小样本、类别不平衡及地块边界模糊等难题。在构建过程中,团队需协调不同国家(加泰罗尼亚与法国)的农民申报数据开放许可与格式差异,实现从地方投影到卫星图像通用投影的几何对齐,并设计出既能兼容FAO分类层级、又能整合非农业地类的统一标签体系,最终在366×366像素的非重叠图块中完成多分辨率波段(10米至60米)的时空配准与标注,确保数据集的可扩展性与模块化适用性。
常用场景
经典使用场景
Sen4AgriNet数据集的核心经典应用场景在于利用多时相、多年度、多国家的Sentinel-2卫星影像进行农作物分类与语义分割。该数据集融合了来自西班牙加泰罗尼亚和法国农业地块的官方LPIS(土地地块识别系统)标注数据,提供了包含全部13个光谱波段的时间序列影像块。研究者可基于此开展像素级或地块级的作物类型识别任务,通过深度学习模型捕捉作物生长的物候变化与空间异质性,从而构建具有跨地域泛化能力的农业监测模型。其模块化设计还允许扩展至其他欧盟国家,为大规模、标准化的农业遥感智能解译奠定了基准。
解决学术问题
该数据集有效解决了遥感与人工智能交叉领域中长期存在的两大核心学术难题:缺乏跨时空、跨区域统一标注的农业地块数据,以及各国LPIS分类体系与语言差异导致的标签异构性问题。Sen4AgriNet通过引入基于联合国粮农组织(FAO)指示性作物分类(ICC)框架的标准化作物分类法,将不同国家的本地化标签映射为统一的层次化英文类别体系,从而消除了语义鸿沟。这一创新使研究者得以训练出在空间和时间上均具有鲁棒性的深度学习模型,显著提升了作物分类与分割任务的泛化性能,为欧洲共同农业政策(CAP)下的精准农业研究提供了关键数据支撑。
实际应用
在实际应用层面,Sen4AgriNet可直接服务于欧盟共同农业政策(CAP)框架下的农业补贴核查与耕地监测体系。基于该数据集训练的深度学习模型能够自动化识别地块内的作物类型,辅助支付机构高效验证农户申报信息的真实性,大幅降低人工实地核查的成本与时间。此外,其多时相、多光谱的特性可用于作物生长状态评估、产量预测以及农业灾害(如干旱、病虫害)的早期预警系统。结合Sentinel-1合成孔径雷达数据与气象数据的扩展潜力,该数据集还为农业保险风险评估与精准农业决策支持系统提供了可靠的数据底座。
数据集最近研究
最新研究方向
在遥感与农业智能监测的前沿领域,Sen4AgriNet数据集正推动基于深度学习的作物分类与语义分割研究迈向跨国家、多年份的时空泛化新阶段。该数据集融合了欧盟共同农业政策(CAP)下开放的土地地块识别系统(LPIS)数据与Sentinel-2多光谱影像,解决了此前缺乏标准化、大规模、像素级标注数据的关键瓶颈。当前研究热点聚焦于利用其多时相、多光谱与地块级标签特性,开发可迁移的作物类型识别模型,以应对欧洲不同国家间作物分类体系不统一与地理空间变异性的挑战。此外,该数据集在农业补贴合规性监测、精准农业决策支持及早期风险预警等实际应用中展现出巨大潜力,其模块化扩展能力也为整合气象与雷达数据提供了新路径,成为推动地球观测与人工智能交叉领域发展的标志性基准资源。
以上内容由遇见数据集搜集并总结生成



