MohamedAzizBhouri/MF_RPN_convection_super_param_CAM5_SPCAM5
收藏Hugging Face2023-10-16 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/MohamedAzizBhouri/MF_RPN_convection_super_param_CAM5_SPCAM5
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
---
## Probabilistic Multi-fidelity climate model parameterization for better generalization and extrapolation
Code and data accompanying the manuscript titled "Multi-fidelity climate model parameterization for better generalization and extrapolation", authored by Mohamed Aziz Bhouri, Liran Peng, Michael S Pritchard and Pierre Gentine.
## Abstract
Machine-learning-based parameterizations (i.e. representation of sub-grid processes) of global climate models or turbulent simulations have recently been proposed as a powerful alternative to physical, but empirical, representations, offering a lower computational cost and higher accuracy. Yet, those approaches still suffer from a lack of generalization and extrapolation beyond the training data, which is however critical to projecting climate change or unobserved regimes of turbulence. Here we show that a multi-fidelity approach, which integrates datasets of different accuracy and abundance, can provide the best of both worlds: the capacity to extrapolate to warmer climates leveraging abundant low-fidelity data and a higher accuracy using resolving high-fidelity data. In an application to climate modeling, the multi-fidelity framework yields more accurate climate projections without requiring major increase in computational resources, while providing trustworthy uncertainty quantification across a wide range of scenarios. Our approach paves the way for the use of machine-learning based methods that can optimally leverage historical observations or high-fidelity simulations and extrapolate to unseen regimes such as climate change.
## Citation
@article{Bhouri2023MF_RPN_cv_param,
title = {Multi-fidelity climate model parameterization for better generalization and extrapolation},
author = {Bhouri, Mohamed Aziz and Peng, Liran and Pritchard, Michael S. and Gentine, Pierre },
journal = {arXiv preprint arXiv:2309.10231},
doi = {https://doi.org/10.48550/arXiv.2309.10231},
year = {2023},
}
- The code was tested using the jax version 0.3.13, the jaxlib version 0.3.10, the numpy version 1.20.1 and the scipy version 1.7.2.
- All codes names intentionally start with numbers in order to make the processing order needed to run them easier to follow:
#####################################################################################################
1. Files "0_data_process_CAM5.py" and "0_data_process_SPCAM5.py" process the raw data generated by CESM2.1.3 CAM5 and SPCAM5 models. In particular, chosen variables given the problem of interest are kept and a temporal subsampling of factor 2 is implemented. In addition, data is concatenated over several days in order to reduce the number of final files. The number of days considered for concatenation is determined by how much memory is available for the hardware on which the scripts are run. "0_data_process_CAM5.py" is used to process CAM5 +4K and +8K data and the resulting files are saved under folders "data_CAM5_4K" and "data_CAM5_8K" respectively. "0_data_process_SPCAM5.py" is used to process SPCAM5 historical and +4K data and the resulting files are saved under folders "data_SPCAM5_hist" and "data_SPCAM5_4K" respectively.
#####################################################################################################
2. File "1_create_train_test.py" creates train and test datasets with only the final relevant variables for the convection parameterization (see manuscript). Datasets are concatenated along the whole time period. Scripts in step 1 are needed since these codes are run on all GCM outputs which are relatively expensive in terms of memory. Hence a concatenation over several months by directly loading all GCM outputs is not doable given our available hardware. Therefore we needed this two-step approach for data concatenation. "1_create_train_test.py" creates the high-fidelity training (SPCAM5 historical run for 3 month) and testing (SPCAM5 +4K for a year) datasets. It also creates the two candidate low-fidelity training datasets (CAM5 +4K and +8K for a year).
#####################################################################################################
3. File "2_candle_plots_data_distr.py" shows the data distribution for the 5 pressure levels 137, 259, 494, 761 and 958 hPa, for the heat tendency and specific humidity, and for the highest pressure level (lowest altitude) for the moisture tendency. It creates the candle plots corresponding to these data distributions and available in the manuscript ("candle_plots_5_pr_lvls_heat_tend_and_spec_hum.png" and "candle_plots_1st_lvl_SS_moist_tend.png").
#####################################################################################################
4. File "2_norm.py" computes and saves the mean and standard deviation for parameterization inputs and outputs based on low-fidelity training data (CAM5 +8K simulation of a year) and high-fidelity training data (SPCAM historical run for a period of three months). The results are saved in folder "norm".
#####################################################################################################
5. Files" "3_train_RPN_MF.py" and "3_train_RPN_SF.py" train the multi- and single-fidelity models and save their parameters in folders "MF_param" and "SF_param" respectively. The number of models to be trained in parallel by running any of the scripts once is fixed by the variable "ensemble_size". Given the available hardware, we had to use "ensemble_size=1" since we could only access singular nodes and we varied "n_run_param" from 0 to 127. However, we were able to access multiple single nodes independently and hence the training is conducted in parallel ultimately. "3_train_RPN_SF.py" is also used to train the deterministic model by making the variable "N_rpn_SF" equal to "N_tot_SF" in order to use all training data and by changing the subfolder within "SF_param" where the parameters are saved.
#####################################################################################################
6. File "4_concat_param.py" concatenates the parameters so that it corresponds to parameters that would be saved if 128 NNs are trained with a singular run of the scripts detailed in point 5. The size of resulting individual files can go up to 134 mb which prevents uploading them into github directly but we wanted to show how a concise parameters representation for RPN is doable. Subsequent scripts use the parameters that were saved separately for each individual RPN member (resulting from point 5 above).
#####################################################################################################
7. File "4_pred_RPN_det.py" computes and saves the deterministic prediction for the test dataset. Files "4_pred_RPN_SF.py", "4_pred_RPN_LF.py" and "4_pred_RPN_MF.py" compute and save predictions for the test dataset obtained for each individual member of SF-RPN, LF-RPN and MF-RPN. We had to perform this step since our hardware did not have enough virtual memory to make the ensemble predictions for 128 million test datapoints. If memory allows, the ensemble predictions can be performed at once by changing the variable "ensemble_size" to the actual ensemble size and then compute related statistics (mean, standard deviation, higher-order moments, etc).
#####################################################################################################
8. Files "5_mean_std_RPN_SF.py", "5_mean_std_RPN_LF.py" and "5_mean_std_RPN_MF.py" compute and save the mean and standard deviation of the ensemble predictions for the test dataset computed and saved in point 7 above. As mentioned above, if memory allows the points 7 and 8 are merged into one step.
#####################################################################################################
9. File "6_reshape_pred_RPN.py" reshapes and saves the deterministic NN prediction for the test dataset, and the mean and standard deviation of the ensemble predictions for the test dataset for SF-RPN, LF-RPN and MF-RPN models. It uses the saved prediction from step 8 and from running the script "4_pred_RPN_det.py" in step 7. File "6_reshape_pred_RPN.py" also reshapes and saves the actual test dataset output. The reshaped tensors are in shape [dim_y x Nt x lat xlon], where dim_y=48 is the output dimension, Nt the total number of time steps for the test dataset, lat=96 the number of latitude points and lon = 144 the number of longitude points. These results are saved in folders "data_SPCAM5_4K", "MF_param" and "SF_param".
#####################################################################################################
10. File "7_global_errors_temporal_errors.py" computes and saves global (if is_glob_err = 1)and temporal errors (if is_temp_MAE = 1 and/oris_temp_r2 = 1) for all models (det NN, SF-RPN, MF-RPN and LF-RPN). Global errors are saved in folder "glob_errors". Temporal errors are plotted and saved in folder "temp_plots". File "7_global_errors_temporal_errors.py" uses the results obtained in point 9.
#####################################################################################################
11. File "7_global_crps.py" computes and saves the CRPS scores for SF-RPN, MF-RPN and LF-RPN. Individual predictions within the ensemble for each of the models need to be reshaped by setting "is_reshape_single_pred = 1", then the corresponding CRPS score is computed and saved in folder "glob_errors' by setting "is_reshape_single_pred = 0".
#####################################################################################################
12. File "7_long_lat_errors.py" computes and saves the longitude-latitude variations of MAE and R2 for all models (det NN, SF-RPN, MF-RPN and LF-RPN) in folders "MF_results" and "SF_results" using the results obtained in point 9.
#####################################################################################################
13. File "7_pressure_lat_errors" computes and saves the pressure(altitude)-latitude variations of MAE and R2 for all models (det NN, SF-RPN, MF-RPN and LF-RPN) in folders "MF_results" and "SF_results" using the results obtained in point 9.
#####################################################################################################
14. File "8_plot_global_errors.py" creates the plots for the global errors (MAE, R2 and CRPS) for all models (det NN, SF-RPN, MF-RPN and LF-RPN) using the results obtained in points 10 and 11. The plots are saved in folder "glob_errors".
#####################################################################################################
15. File "8_long_lat_plots.py" creates and saves the plots for the longitude-latitude variations of MAE and R2 for all models (det NN, SF-RPN, MF-RPN and LF-RPN) in folder "long_lat_plots" if variable "is_uncert = 0". These plots are based on the results obtained in point 12. File "8_long_lat_plots.py" also creates the plots for the longitude-latitude variations of the uncertainty for SF-RPN, MF-RPN and LF-RPN models if variable "is_uncert = 1". These plots are saved in folder "long_lat_uncert_plots" and are based on results obtained in point 9.
#####################################################################################################
16. File "8_pressure_lat_plots" creates and saves the plots for the pressure(altitude)-latitude variations of R2 for all models (det NN, SF-RPN, MF-RPN and LF-RPN) under the names "r2_press_lat_heat.png" and "r2_press_lat_moist.png" for heat and moisture tendencies respectively. These plots are based on the results obtained in point 13.
#####################################################################################################
17. File "8_uncertainty_density_plot" creates the plots for the density of uncertainty as a function of error for SF-RPN, MF-RPN and LF-RPN models. These plots are saved in folder "uncertainty_density_plots" and are based on results obtained in point 9.
#####################################################################################################
18. File "9_uncertainty_video.py" creates and saves the videos of complete spatio-temporal evolution of MAEs and returned uncertainties for the heat and moisture tendencies by different models (MF-RPN, LF-RPN adn SF-HF-RPN) at vertical levels 259, 494 and 761 hPa. The videos are saved in folders "videos". File "9_uncertainty_video.py" uses the results obtained in point 9.
#####################################################################################################
19. File "9_uncertainty_video_daily.py" creates and saves the videos of spatio-temporal evolution of MAEs based on daily-averaged predictions and daily-averaged returned uncertainties for the heat and moisture tendencies by different models (MF-RPN, LF-RPN adn SF-HF-RPN) at vertical levels 259, 494 and 761 hPa. The videos are saved in folders "videos". File "9_uncertainty_video_daily.py" uses the results obtained in point 9.
提供机构:
MohamedAzizBhouri
原始信息汇总
数据集概述
数据集名称
Probabilistic Multi-fidelity climate model parameterization for better generalization and extrapolation
数据集描述
该数据集伴随论文《Multi-fidelity climate model parameterization for better generalization and extrapolation》,由Mohamed Aziz Bhouri, Liran Peng, Michael S Pritchard 和 Pierre Gentine 共同撰写。数据集用于支持多保真气候模型参数化,以提高泛化和外推能力。
数据处理步骤
-
数据预处理:
- 文件 "0_data_process_CAM5.py" 和 "0_data_process_SPCAM5.py" 处理由CESM2.1.3 CAM5和SPCAM5模型生成的原始数据。
- 选择与问题相关的变量,并实施时间子采样因子2,数据在多个天内进行拼接以减少最终文件数量。
- 处理后的数据分别保存在 "data_CAM5_4K" 和 "data_CAM5_8K" 文件夹(CAM5数据)以及 "data_SPCAM5_hist" 和 "data_SPCAM5_4K" 文件夹(SPCAM5数据)。
-
创建训练和测试数据集:
- 文件 "1_create_train_test.py" 创建训练和测试数据集,仅包含最终相关的变量。
- 创建高保真训练数据集(SPCAM5历史运行3个月)和测试数据集(SPCAM5 +4K运行1年),以及两个低保真训练数据集(CAM5 +4K和+8K运行1年)。
-
数据分布可视化:
- 文件 "2_candle_plots_data_distr.py" 显示5个压力层(137, 259, 494, 761 和 958 hPa)的热倾向和比湿的数据分布,并创建相应的烛台图。
-
计算均值和标准差:
- 文件 "2_norm.py" 计算并保存基于低保真训练数据(CAM5 +8K模拟1年)和高保真训练数据(SPCAM历史运行3个月)的参数化输入和输出的均值和标准差。
-
模型训练:
- 文件 "3_train_RPN_MF.py" 和 "3_train_RPN_SF.py" 训练多保真和单保真模型,并保存其参数在 "MF_param" 和 "SF_param" 文件夹中。
-
参数拼接:
- 文件 "4_concat_param.py" 拼接参数,以便与128个NNs训练的参数相对应。
-
计算预测:
- 文件 "4_pred_RPN_det.py" 计算并保存测试数据集的确定性预测。
- 文件 "4_pred_RPN_SF.py", "4_pred_RPN_LF.py" 和 "4_pred_RPN_MF.py" 计算并保存SF-RPN, LF-RPN 和 MF-RPN的测试数据集预测。
-
计算均值和标准差:
- 文件 "5_mean_std_RPN_SF.py", "5_mean_std_RPN_LF.py" 和 "5_mean_std_RPN_MF.py" 计算并保存测试数据集的均值和标准差。
-
重塑预测数据:
- 文件 "6_reshape_pred_RPN.py" 重塑并保存确定性NN预测和SF-RPN, LF-RPN 和 MF-RPN模型的均值和标准差。
-
计算全局和时间误差:
- 文件 "7_global_errors_temporal_errors.py" 计算并保存全局和时间误差。
-
计算CRPS分数:
- 文件 "7_global_crps.py" 计算并保存SF-RPN, MF-RPN 和 LF-RPN的CRPS分数。
-
计算经纬度误差:
- 文件 "7_long_lat_errors.py" 计算并保存所有模型的经纬度误差。
-
计算压力-纬度误差:
- 文件 "7_pressure_lat_errors" 计算并保存所有模型的压力-纬度误差。
-
绘制全局误差图:
- 文件 "8_plot_global_errors.py" 创建并保存全局误差图。
-
绘制经纬度误差图:
- 文件 "8_long_lat_plots.py" 创建并保存经纬度误差图。
-
绘制压力-纬度误差图:
- 文件 "8_pressure_lat_plots" 创建并保存压力-纬度误差图。
-
绘制不确定性密度图:
- 文件 "8_uncertainty_density_plot" 创建并保存不确定性密度图。
-
创建不确定性视频:
- 文件 "9_uncertainty_video.py" 创建并保存MAEs和不确定性随时间空间演变的视频。
-
创建每日不确定性视频:
- 文件 "9_uncertainty_video_daily.py" 创建并保存基于每日平均预测和不确定性的MAEs视频。



