five

METAR cloud cover over weather stations in Sentinel-2 satellite images

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14691473
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains the processed Sentinel-2 satellite images of level L1C, used in the Machine Learning model of MSCC (also known as AIMLSSE) for the ordinal classification of cloud cover in the METAR format as commonly used by weather stations. This dataset contains 12749 cut-outs of satellite images in the dimensions of 16 × 16 km with a weather station in the center. The images are reduced to a size of 300 × 300 px. The images are in the TIF format, with RGB color channels that have been pre-multiplied by the NDSI factor of each Sentinel-2 color band (Band 2, 3, and 4) and a SWIR band of a similar resolution (B11). The names and resolutions are also found here: bands_metadata.csv Band Description Resolution Central Wavelength B2 Blue 10 m 490 nm B3 Green 10 m 560 nm B4 Red 10 m 665 nm B11 Short Wave Infrared (SWIR) 20 m 1610 nm Be aware that data mismatches between weather station data and satellite images are unavoidable. The time differences between satellite and weather station observations have been capped at 30 minutes to ensure relatively stable weather conditions for the training model while also providing enough data to train, validate, and test the Machine Learning model. Additionally, some stations are no longer stationed at their original location, which is not reflected in the input data. The data itself is organized into three subsets for training, validation, and testing of Machine Learning models. Each directory is structured as outlined here: ├── [station name 1]/ │ ├── [Sentinel-2 product ID 1].tif │ ├── [Sentinel-2 product ID 2].tif │ └── ... ├── [station name 2]/ │ ├── [Sentinel-2 product ID 1].tif │ ├── [Sentinel-2 product ID 2].tif │ └── ... └── ... The station name directly identifies the weather station with a unique ID, while the Sentinel-2 product ID allows to re-download the exact same image from which this section has been taken via the Copernicus Open Access Hub. The data is collected from January 2023 until June 2023. Assuming that the locations are correct, the stations have been clustered for an approximate 80-10-10 distribution for training, validation, and test respectively, ensuring no overlapping regions exist between the three subsets. The mapping of the images to the METAR reference values as well as the manual labels, and the time difference between satellite image and METAR observation are provided in the following files: training_labels.csv validation_labels.csv test_labels.csv Automatic cloud cover is found under “max cloud cover” while the manually adjusted labels are located in column “true cloud cover”. The label distributions apply to the dataset labels. The weights for the dataset labels are provided in: training_weights.csv The weights correspond to the inverse frequency of label occurrence in the dataset and can be used for both loss functions and data samplers.
创建时间:
2025-01-27
二维码
社区交流群
二维码
科研交流群
商业服务