five

[Data] Prediction of soft proton intensities in the near-Earth space using machine learning

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4718560
下载链接
链接失效反馈
官方服务:
资源简介:
1  Introduction The dataset consists of four files: File Name Type RAPID_OMNI_ML_023_raw.h5 dataset RAPID_OMNI_ML_023_traincut.h5 dataset RAPID_OMNI_ML_023_testcut.h5 dataset RAPID_OMNI_ML_023_robusttranscut.pkl pre-fit scaler   1.1 Variables The dataset files contain the following variables. Variable Name Type Unit Description p1, p2, p3, float 1/(s⋅cm²⋅sr⋅keV) proton intensities p4, p5, p6, p7 x, y, z float RE position in GSE coordinates rdist float RE radial distance from the Earth AE_index float nT Auroral Electroject (AE) index SYM-H_index float nT symmetric disturbance field in horiz. direction F107 float sfu the solar radio flux at 10.7 cm BimfxGSE, float nT x, y and z components of the Interplanetary BimfyGSE, Magnetic Field in GSE coordinates BimfzGSE VxSW_GSE, float km/s x, y and z components of the solar wind speed VySW_GSE, VzSW_GSE NpSW float n/cc solar wind density Temp float K solar wind temperature Pdyn float nPa solar wind dynamic pressure DateTime datetime timestamp   1.2 Data Split Ranges The datasets contain data from the following time ranges. Dataset Start End Count RAPID_OMNI_ML_023_raw.h5 2001-01-09 15:21:00 2018-02-19 09:57:00 6,051,937 RAPID_OMNI_ML_023_traincut.h5 2001-01-09 15:21:00 2014-07-24 22:44:00 4,524,200 RAPID_OMNI_ML_023_testcut.h5 2014-07-24 22:45:00 2018-02-19 09:57:00 1,173,865   2  Raw Data Preparation 2.1 Data Source OMNIWeb (NASA) From NASA/GSFC's OMNI data set through OMNIWeb, we extracted the following variables from 2001 to 2019: Variable Original Name in OMNIWeb Resolution AE_index AE Index, nT 1-min SYM-H_index SYM/H, nT 1-min F107 Solar index F10.7 1-hour * BimfxGSE Bx, GSE/GSM, nT 1-min BimfyGSE By, GSE, nT 1-min BimfzGSE Bz, GSE, nT 1-min VxSW_GSE Vx Velocity, GSE, km/s 1-min VySW_GSE Vy Velocity, GSE, km/s 1-min VzSW_GSE Vz Velocity, GSE, km/s 1-min NpSW Proton Density, n/cc 1-min Temp Proton Temperature, K 1-min * Solar index F10.7 is not available at higher resolution. Cluster Science Archive (ESA) Through the Cluster Archive Inter-Operability Subsystem, we have access to the proton intensities in 7 energy channels measured by RAPID instrument onboard Cluster satellite. We got the following variables between 2001 and 2009 from the CDF files: Variables Dataset ID in CAIO Variable in CDF Files Resolution p1, p2, p3, p4, C4_CP_RAP_HSPCT Proton_Dif_flux__C4_CP_RAP_HSPCT 4127-ms p5, p6, p7 x_km, y_km, z_km* C4_CP_AUX_POSGSE_1M sc_r_xyz_gse__C4_CP_AUX_POSGSE_1M 1-min * Positions in km are not included in the final dataset.   2.2 Custom Features OMNIWeb (NASA) Variable Source Pdyn NpSW * (VxSW_GSE^2 + VySW_GSE^2 + VzSW_GSE^2) * 1.67e-6 Cluster Science Archive (ESA) Variable Source x x_km / 6371.1 y y_km / 6371.1 z z_km / 6371.1 rdist sqrt(x^2 + y^2 + z^2)   2.3 Sampling and Interpolation - We use the value of F107 at a 1-hour resolution to represent all values in each 1-hour bin. - For successful integration, we sampled the proton intensities to a resolution of 1 minute. More specifically, we calculate the averaged proton intensities for seven channels in each minute and use them to represent the values at first second in each minute, e.g., values at 2001/1/9 15:22:00 are calculated with the data from 15:22:00 - 15:22:59.   2.4 Integration As now data from different sources can be aligned with Datetime at a resolution of 1 minute, we can merge them.   2.5 Cleaning At last, we dropped the rows with outliers (fill values) in any OMNI variable. Please refer to the in the description from OMNIWeb for more information about the fill values. The raw data generated is available in the package with the name RAPID_OMNI_ML_023_raw.h5.   3  Experiment-specific Pre-processing Besides, we took the steps below to the dataset for our experiments.   3.1 Splitting The dataset is split into a training set and a test set with a ratio of 8:2.   3.2 Filtering - We filtered out the rows with rdist less than or equal to 6. - We also use NaNs to substitute the proton intensities less than or equal to the threshold, which is 5, 1, 0.5, 0.1, 0.05, 0.005 or 0.001 respectively for 7 channels.   3.3 Transform - We did not perform any transform or scaling directly on the data in pre-processing. Instead, a Robust Scaler fit with the training data was dumped as a file and used in the experiment.   The pre-processed data and scaler are available under names RAPID_OMNI_ML_023_traincut.h5, RAPID_OMNI_ML_023_testcut.h5 and RAPID_OMNI_ML_023_robusttranscut.pkl.
创建时间:
2021-08-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作