Data and codes from: Comparison of Solar Imaging Feature Extraction Methods in the Context of Space Weather Prediction with Deep Learning-Based Models

Recherche Data Gouv France2025-01-01 更新2026-04-09 收录

下载链接：

https://entrepot.recherche.data.gouv.fr/citation?persistentId=doi:10.57745/DZT7DS

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset contains replication data for the paper "Comparison of Solar Imaging Feature Extraction Methods in the Context of Space Weather Prediction with Deep Learning-Based Models". It includes files stored into HDF5 (Hierarchical Data Format) file using HDFStore. One file contains the extracted features using the 6 different techniques for the wavelength 19.3 nm named solar_extracted_features_v01_2010-2020.h5 and the second the SERENADE outputs named serenade_predictions_v01.h5. Both files contain several datasets labeled with ‘keys’. The latter correspond to the extraction method. Here is a list of the key names: gn_1024: corresponding to the GoogLenet extractor with 1024 components. pca_1024: corresponding to the Principle Component Analysis technique leaving 1024 components. ae_1024: corresponding to the AutoEncoder with a latent space of 1024. gn_256 (only in solar_extracted_features_v01_2010-2020.h5): corresponding to the GoogLenet extractor with 256 components. pca_256: corresponding to the Principle Component Analysis technique leaving 256 components. ae_256: corresponding to the AutoEncoder technique with a latent space of 256. vae_256 (only in solar_extracted_features_v01_2010-2020.h5): corresponding to the Variational AutoEncoder technique with a latent space of 256. vae_256_old (only in serenade_predictions_v01.h5): the output predictions of SERENADE using the VAE extracted features using the hyperparameters optimized for GoogLeNet. vae_256_new (only in serenade_predictions_v01.h5): the output predictions of SERENADE using the VAE extracted features with the alternative architecture. All the above-mentioned models are explained and detailed in the paper. In order to read the files, the user can do it with the Pandas package for Python as follows: import pandas as pd df = pd.read_hdf('file_name.h5', key = 'model_name') and replace file_name by either solar_extracted_features_v01_2010-2020.h5 or serenade_predictions_v01.h5 and model_name by one of the models in the list above. The extracted features dataset will output a pandas DataFrame indexed by datetime and either 1024 or 256 columns of features. An additional column indicates to which subset (train, validation and test) the corresponding row belongs. The SERENADE outputs dataset will output a DataFrame indexed by datetime and 4 columns: Observations: the first column contains the true daily maximum of the Kp index. Predictions: the second column contains the predicted mean of the daily maximum of the Kp index. Standard Deviation: the third column contains the standard deviation as the predictions are probabilistic. Model: this column specifies from which feature extractor model the inputs were used to generate the predictions. We add the feature extractors AE and VAE class codes as well as their weights in the AEs_class.py and VAE_class.py codes and best_AE_1024.ckpt, best_AE_256.ckpt and best_VAE.ckpt checkpoints respectively. The figures in the manuscript can be reproduced using the codes named after the corresponding figure. The files 6_mins_predictions and seed_variation contain the SERENADE predictions to reproduce figures 7, 8, 9 and 10.

创建时间：

2025-01-01

5,000+

优质数据集

54 个

任务类型

进入经典数据集