METAR cloud cover over weather stations in Sentinel-2 satellite images
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14691473
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains the processed Sentinel-2 satellite images of level L1C, used in the Machine Learning model of MSCC (also known as AIMLSSE) for the ordinal classification of cloud cover in the METAR format as commonly used by weather stations.
This dataset contains 12749 cut-outs of satellite images in the dimensions of 16 × 16 km with a weather station in the center. The images are reduced to a size of 300 × 300 px. The images are in the TIF format, with RGB color channels that have been pre-multiplied by the NDSI factor of each Sentinel-2 color band (Band 2, 3, and 4) and a SWIR band of a similar resolution (B11). The names and resolutions are also found here:
bands_metadata.csv
Band
Description
Resolution
Central Wavelength
B2
Blue
10 m
490 nm
B3
Green
10 m
560 nm
B4
Red
10 m
665 nm
B11
Short Wave Infrared (SWIR)
20 m
1610 nm
Be aware that data mismatches between weather station data and satellite images are unavoidable. The time differences between satellite and weather station observations have been capped at 30 minutes to ensure relatively stable weather conditions for the training model while also providing enough data to train, validate, and test the Machine Learning model. Additionally, some stations are no longer stationed at their original location, which is not reflected in the input data.
The data itself is organized into three subsets for training, validation, and testing of Machine Learning models. Each directory is structured as outlined here:
├── [station name 1]/
│ ├── [Sentinel-2 product ID 1].tif
│ ├── [Sentinel-2 product ID 2].tif
│ └── ...
├── [station name 2]/
│ ├── [Sentinel-2 product ID 1].tif
│ ├── [Sentinel-2 product ID 2].tif
│ └── ...
└── ...
The station name directly identifies the weather station with a unique ID, while the Sentinel-2 product ID allows to re-download the exact same image from which this section has been taken via the Copernicus Open Access Hub. The data is collected from January 2023 until June 2023.
Assuming that the locations are correct, the stations have been clustered for an approximate 80-10-10 distribution for training, validation, and test respectively, ensuring no overlapping regions exist between the three subsets.
The mapping of the images to the METAR reference values as well as the manual labels, and the time difference between satellite image and METAR observation are provided in the following files:
training_labels.csv
validation_labels.csv
test_labels.csv
Automatic cloud cover is found under “max cloud cover” while the manually adjusted labels are located in column “true cloud cover”. The label distributions apply to the dataset labels.
The weights for the dataset labels are provided in:
training_weights.csv
The weights correspond to the inverse frequency of label occurrence in the dataset and can be used for both loss functions and data samplers.
创建时间:
2025-01-27



