CloudSEN12 - a global dataset for semantic understanding of cloud and cloud shadow in Sentinel-2

Mendeley Data2024-06-27 更新2024-06-27 收录

下载链接：

https://zenodo.org/record/7034410

下载链接

链接失效反馈

官方服务：

资源简介：

Description CloudSEN12 is a large dataset for cloud semantic understanding that consists of 9880 regions of interest (ROIs). Each ROI has five 5090x5090 meters image patches (IPs) collected on different dates; we manually choose the images to guarantee that each IP inside an ROI matches one of the following cloud cover groups: - clear (0%) - low-cloudy (1% - 25%) - almost clear (25% - 45%) - mid-cloudy (45% - 65%) - cloudy (65% >) An IP is the core unit in CloudSEN12. Each IP contains data from Sentinel-2 optical levels 1C and 2A, Sentinel-1 Synthetic Aperture Radar (SAR), digital elevation model, surface water occurrence, land cover classes, and cloud mask results from eight cutting-edge cloud detection algorithms. Besides, in order to support standard, weakly, and self-/semi-supervised learning procedures, cloudSEN12 includes three distinct forms of hand-crafted labelling data: high-quality, scribble, and no annotation. Consequently, each ROI is randomly assigned to a different annotation group: 2000 ROIs with pixel-level annotation, where the average annotation time is 150 minutes (high-quality group). 2000 ROIs with scribble level annotation, where the annotation time is 15 minutes (scribble group). 5880 ROIs with annotation only in the cloud-free (0\%) image (no annotation group). For high-quality labels, we use the Intelligence foR Image Segmentation\cite{iris2019} (IRIS) active learning technology, a system that combines human photo-interpretation and machine learning. For scribble, ground truth pixels were drawn using IRIS but without ML support. Finally, the no annotation dataset is generated automatically, with manual annotation only in the clear image patch. The dataset is already available here: https://shorturl.at/cgjtz. Check out our website https://cloudsen12.github.io/ for examples of how to download the dataset via STAC.

数据集描述：CloudSEN12是一款面向云语义理解的大型数据集，共包含9880个感兴趣区域（regions of interest，ROIs）。每个ROI包含5张采集自不同日期的5090×5090米图像块（image patches，IPs）；研究团队通过人工筛选图像，确保单个ROI内的每张图像块均属于以下云量分组之一：无云（0%）、少云（1%~25%）、近晴空（25%~45%）、中云量（45%~65%）、多云（>65%）。图像块是CloudSEN12的核心单元，每张图像块包含Sentinel-2光学1C级与2A级数据、Sentinel-1合成孔径雷达（Synthetic Aperture Radar，SAR）数据、数字高程模型、地表水出现频次数据、土地覆盖分类数据，以及8种前沿云检测算法生成的云掩膜结果。为支撑标准监督、弱监督以及自/半监督学习流程，CloudSEN12提供三类差异化的人工标注数据：高质量标注、涂鸦标注与无标注。具体分配规则如下：每个ROI将被随机划分至不同标注组别——2000个ROI采用像素级高质量标注，单样本平均标注时长为150分钟（高质量标注组）；2000个ROI采用涂鸦级标注，单样本标注时长为15分钟（涂鸦标注组）；剩余5880个ROI仅在无云图像块上完成标注（无标注组）。高质量标注采用了图像分割智能系统Intelligence foR Image Segmentation（IRIS）cite{iris2019}的主动学习技术，该系统融合了人工图像解译与机器学习流程。涂鸦标注同样借助IRIS完成，但未引入机器学习辅助。无标注数据集则为自动生成，仅在无云图像块上进行人工校验标注。本数据集已在"https://shorturl.at/cgjtz"公开，可访问官网"https://cloudsen12.github.io/"查看如何通过时空资产目录（SpatioTemporal Asset Catalog，STAC）下载数据集的示例教程。

创建时间：

2023-06-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集