Clouds-1500
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/2khchjbgzr
下载链接
链接失效反馈官方服务:
资源简介:
The Clouds-1500 dataset is an extension of original clouds-1000 dataset, which can be accessed through this link: https://data.mendeley.com/datasets/4pw8vfsnpx. It comprises a collection of sky images taken between March 2021 and Janary 2023. The images were captured by ground-based cameras pointed towards the horizon, in the north and south directions, at the facilities of the Federal University of Santa Catarina and in the solarimetric station of the Photovoltaic Energy Laboratory, located in the Sapiens Technological Park. This dataset is part of the Machine Learning Methods Project for Nowcasting for Solar Energy, conducted by the Laboratory of Image Processing and Computer Graphics - LAPIX of the National Institute of Digital Convergence - INCoD.
The image annotation process was carried out manually by a team comprising computer scientists, meteorologists, and an experienced sky observer from the Brazilian Air Force Base on Santa Catarina Island. Annotations were made using the polygon tool on the Supervisely platform to mark the clouds visible in each image.
This dataset is created to help machine learning algorithms identify clouds in images taken from ground-level locations using ordinary cameras. It employs a practical cloud height-based classification system that categorizes clouds into four groups: Cirriforms, Cumuliforms, Stratiforms, and Stratocumuliforms. Additionally, a category representing background objects like trees and buildings is included in the annotations. This classification system is aimed at enhancing nowcasting in the solar energy sector by predicting the potential solar radiation absorption by clouds covering solar energy facilities. The decision to categorize clouds into these four groups stems from the need to efficiently forecast solar radiation interference and because earlier attempts at finer classification led to inadequate learning outcomes with neural networks. This subpar performance is likely due to the inherent similarities among actual cloud classes within each superclass and the vague and transparent characteristics of clouds, making precise classification challenging and yielding poor results.
The dataset underwent several validation steps to ensure its quality and reliability. Initially, three team members inspected and checked a randomly chosen subset of images for annotation consistency, assessing the quality of manual annotations. In a second semi-automated step, the dataset was split into training and validation sets. A semantic segmentation convolutional neural network (PPLite B2) was trained on it, and used to pinpoint the 100 images with the lowest classification scores. A meteorologist from the team then manually reviewed and corrected these images, as they were the most likely to contain annotation errors. This procedure helped to detect and amend any remaining errors or issues, enhancing the overall quality and reliability of the dataset.
创建时间:
2023-10-31



