five

KappaSet: Sentinel-2 KappaZeta Cloud and Cloud Shadow Masks

收藏
Mendeley Data2024-05-17 更新2024-06-27 收录
下载链接:
https://zenodo.org/records/7100327
下载链接
链接失效反馈
官方服务:
资源简介:
General information The dataset consists of 9251 labelled sub-tiles from 1038 Sentinel-2 (S2) Level-1C (L1C) products distributed over the globe. In terms of seasonal distribution, S2 products can be divided into the following groups: Winter products: 29 austral and 142 boreal S2 products Sprint products: 45 austral and 257 boreal S2 products Summer products: 30 austral and 293 boreal S2 products Autumn products: 29 austral and 213 boreal S2 products Each S2 product was oversampled at 10 m resolution for 512 x 512 pixels sub-tiles. From each S2 product, the most challenging ~5 sub-tiles per product were selected for labelling. Each selected L1C S2 product represents different clouds, such as cumulus, stratus, or cirrus, which are spread over various geographical locations around the world. The classification pixel-wise map consists of the following categories: 0 – UNDEFINED: pixels that the labeler is not sure which class they belong to; 1 – CLEAR: pixels without clouds or cloud shadows; 2 – CLOUD SHADOW: pixels with cloud shadows; 3 – SEMI TRANSPARENT CLOUD: pixels with thin clouds through which the land is visible; include cirrus clouds that are on the high cloud level (5-15km). 4 – CLOUD: pixels with cloud; include stratus and cumulus clouds that are on the low cloud level (from 0-0.2km to 2km). 5 – MISSING: missing or invalid pixels. The dataset was labelled using Computer Vision Annotation Tool (CVAT) and Segments.ai. With the possibility of integrating an active learning process in Segments.ai, the labelling was performed semi-automatically. The distribution of the dataset is presented in the Figure below. Color represents the season from which the product was chosen. The dataset limitations must be considered: the data mostly covers terrestrial regions (around 91%) and includes some water areas (around 9%); only around 7% of the dataset contains snow. Current sub-tiles do not have georeferencing. Contributions and Acknowledgements The data were annotated by Olga Wold, Mariana Rohtsalu, Nikita Murin, Joosep Truupõld and Fariha Harun. The data verification and Software Development were performed by Indrek Sünter, Heido Trofimov, Anton Kostiukhin, Marharyta Domnich, Mihkel Järveoja, Olga Wold and Tetiana Shtym. The methodology was developed by Kaupo Voormansik, Indrek Sünter, Marharyta Domnich and Tetiana Shtym. The data were collected, processed, and checked as a part of “KappaMask: AI-based Cloudmask Processor for Sentinel-2” project. We thank Segments.ai team for providing a wonderful annotation tool that was actively used to prepare the dataset. In the end, we thank European Space Agency (ESA) for supporting, advising, and funding the project. The project was funded by European Space Agency, Contract No. 4000132124/20/I-DT.

数据集基本信息:本数据集包含来自全球分布的1038幅哨兵二号(Sentinel-2, S2)一级C级(Level-1C, L1C)产品的9251张带标注子瓦片。按季节分布划分,S2产品可分为以下类别:冬季产品:29幅南半球、142幅北半球S2产品;春季产品:45幅南半球、257幅北半球S2产品;夏季产品:30幅南半球、293幅北半球S2产品;秋季产品:29幅南半球、213幅北半球S2产品。每幅S2产品均被重采样至10米分辨率,并裁剪为512×512像素的子瓦片。从每幅S2产品中,选取约5张难度较高的子瓦片进行标注。每幅入选的L1C级S2产品均包含不同类型的云系,如积云、层云及卷云,且覆盖全球多样的地理区域。 像素级分类地图包含以下类别:0 – 未定义(UNDEFINED):标注人员无法确定所属类别的像素;1 – 晴空(CLEAR):无云或无云影的像素;2 – 云影(CLOUD SHADOW):带有云影的像素;3 – 半透明云(SEMI TRANSPARENT CLOUD):可透过地表的薄云像素,包括位于5~15km高空的卷云;4 – 云(CLOUD):带有云体的像素,包括位于0~0.2km至2km低空的层云和积云;5 – 缺失(MISSING):缺失或无效的像素。 本数据集采用计算机视觉标注工具(Computer Vision Annotation Tool, CVAT)及Segments.ai平台完成标注。由于Segments.ai支持集成主动学习流程,本次标注采用半自动化方式完成。数据集的分布情况如下图所示,图中颜色代表产品所属季节。需注意本数据集存在以下局限性:数据主要覆盖陆地区域(约91%),仅包含少量水域(约9%);仅约7%的子瓦片包含积雪;当前子瓦片暂未进行地理配准。 贡献与致谢:本数据集的标注工作由Olga Wold、Mariana Rohtsalu、Nikita Murin、Joosep Truupõld及Fariha Harun完成。数据验证与软件开发工作由Indrek Sünter、Heido Trofimov、Anton Kostiukhin、Marharyta Domnich、Mihkel Järveoja、Olga Wold及Tetiana Shtym承担。方法学开发由Kaupo Voormansik、Indrek Sünter、Marharyta Domnich及Tetiana Shtym完成。数据的采集、处理与校验均作为"KappaMask:面向哨兵二号的AI云掩膜处理器"项目的一部分开展。我们感谢Segments.ai团队提供的优秀标注工具,该工具在本数据集制备过程中得到了积极应用。最后,我们感谢欧洲空间局(European Space Agency, ESA)对本项目的支持、指导与资助。本项目由欧洲空间局资助,合同编号为4000132124/20/I-DT。
创建时间:
2023-06-28
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集包含9251个标注的Sentinel-2子图块,覆盖全球不同季节和云层类型,用于云和云阴影的语义分割研究。数据标注包括六种像素级别类别,适用于深度学习和遥感分析。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作