five

Can ingredients based forecasting be learned? Disentangling a random forest's severe weather predictions

收藏
Mendeley Data2024-05-13 更新2024-06-29 收录
下载链接:
https://datadryad.org/stash/dataset/doi:10.5061/dryad.0rxwdbs7w
下载链接
链接失效反馈
官方服务:
资源简介:
# Data for: Can Ingredients-Based Forecasting be Learned? Disentangling a Random Forest's Severe Weather Predictions [https://doi.org/10.5061/dryad.0rxwdbs7w](https://doi.org/10.5061/dryad.0rxwdbs7w) Data for: "Can Ingredients-Based Forecasting be Learned? Disentangling a Random Forest's Severe Weather Predictions" Mazurek, Alexandra C., Aaron J. Hill, Russ S. Schumacher, and Hanna J. McDaniel: "Can Ingredients-Based Forecasting be Learned? Disentangling a Random Forest's Severe Weather Predictions", Weather and Forecasting. Day 2, 3, and 4 forecasts from the machine learning-based prediction system detailed in the associated manuscript (cited above) as well as those from the Storm Prediction Center (SPC) and observations (local storm reports) of severe thunderstorm hazards are included in this dataset. Forecasts, outlooks, and observations for each forecast day (day 2, day 3, day 4) are contained in a single netCDF file. For the day 2 and 3 forecasts, the netCDF files contain three separate machine learning-based forecasts for tornado, wind, and hail hazards; the day 4 files contain one forecast for "any severe" hazard (tornado, hail or wind). The feature contribution file contain the tree interpreter-derived contributions to the machine learning-based forecasts that are in the aforementioned netCDF files. The tree interpreter approach is described in the associated manuscript. Similar to the forecast files, there are separate netCDF files for each of the forecast lead times and hazard types (7 total; day 2 hail, wind, and tornado, day 3 hail, wind, and tornado, and day 4 any severe), with each containing the contributions associated with the forecasts for each of the respective lead times. ## Description of the data and file structure ## Forecast data There are 4 netCDF files containing the forecast data for the day 2, day 3 (two files, see forecast data notes below for details) and day 4 forecasts: * csu_mlp_2021_tor_day2.nc * csu_mlp_2021_tor_day3.nc * csu_mlp_2021_severe_day3.nc * csu_mlp_2021_severe_day4.nc The forecast netCDF files are structured as follows (example of the day 2 forecasts: csu_mlp_2021_tor_day2.nc): ### dimensions: lat = 56 ; lon = 139 ; time = 920 ; ### coordinates: float lat(lat) ; float lon(lon) ; datetime64 time(time); time: units = hours ; time: format = '%Y-%m-%d%H' ; time: timezone = 'UTC' ; time: calendar = "gregorian" ; ### variables: ##### CSU-MLP day 2 tornado machine learning-based forecasts float csu_mlp_2021_tor_day2(time, lat, lon) ; csu_mlp_2021_tor_day2:grid_type = "Latitude/longitude" ; csu_mlp_2021_tor_day2:initial_time = "10/04/2020 (12:00)" ; csu_mlp_2021_tor_day2:first_init = "20201004" ; csu_mlp_2021_tor_day2:valid_day_end = "20230412" ; csu_mlp_2021_tor_day2:version = "csu_mlp_2021_tor_day2" ; ##### SPC day 2 tornado outlook probabilities float day2otlk_netcdf_torn_fine_single_only(time, lat, lon) ; day2otlk_netcdf_torn_fine_single_only:first_init = "20201004" ; day2otlk_netcdf_torn_fine_single_only:valid_day_end = "20230412" ; day2otlk_netcdf_torn_fine_single_only:version = "day2otlk_netcdf_torn_fine_single_only" ; ##### gridded hail reports float hail_gridded(time, lat, lon) ; hail_gridded:first_init = "20201004" ; hail_gridded:valid_day_end = "20230412" ; ##### gridded wind reports float wind_gridded(time, lat, lon) ; wind_gridded:first_init = "20201004" ; wind_gridded:valid_day_end = "20230412" ; ##### gridded tornado reports float tor_gridded(time, lat, lon) ; tor_gridded:first_init = "20201004" ; tor_gridded:valid_day_end = "20230412" ; ##### CSU-MLP day 2 wind machine learning-based forecasts float csu_mlp_2021_wind_day2(time, lat, lon) ; csu_mlp_2021_wind_day2:grid_type = "Latitude/longitude" ; csu_mlp_2021_wind_day2:initial_time = "10/04/2020 (12:00)" ; csu_mlp_2021_wind_day2:first_init = "20201004" ; csu_mlp_2021_wind_day2:valid_day_end = "20230412" ; csu_mlp_2021_wind_day2:version = "csu_mlp_2021_wind_day2" ; ##### SPC day 2 wind outlook probabilities float day2otlk_netcdf_wind_fine_single_only(time, lat, lon) ; day2otlk_netcdf_wind_fine_single_only:first_init = "20201004" ; day2otlk_netcdf_wind_fine_single_only:valid_day_end = "20230412" ; day2otlk_netcdf_wind_fine_single_only:version = "day2otlk_netcdf_wind_fine_single_only" ; ##### CSU-MLP day 2 hail machine learning-based forecasts float csu_mlp_2021_hail_day2(time, lat, lon) ; csu_mlp_2021_hail_day2:grid_type = "Latitude/longitude" ; csu_mlp_2021_hail_day2:initial_time = "10/04/2020 (12:00)" ; csu_mlp_2021_hail_day2:first_init = "20201004" ; csu_mlp_2021_hail_day2:valid_day_end = "20230412" ; csu_mlp_2021_hail_day2:version = "csu_mlp_2021_hail_day2" ; ##### SPC day 2 hail outlook probabilities float day2otlk_netcdf_hail_fine_single_only(time, lat, lon) ; day2otlk_netcdf_hail_fine_single_only:first_init = "20201004" ; day2otlk_netcdf_hail_fine_single_only:valid_day_end = "20230412" ; day2otlk_netcdf_hail_fine_single_only:version = "day2otlk_netcdf_hail_fine_single_only" ; ### Forecast Data Notes All date/times represent the end date/time that the 24-h forecast, outlook, or report period is valid for. For example, the data for the date '2021-03-20 12:00' in the file would correspond to: * a day 2 CSU-MLP forecast initialized with GEFS model data that was initialized 2021-03-18 00:00 UTC * a day 2 SPC outlook issued 2021-03-18 * reports occurring between 2021-03-19 12:00 UTC to 2021-03-20 12:00 UTC All CSU-MLP forecasts in this dataset are initialized with data from 0000 UTC run of the operational Global Ensemble Forecast System (GEFS). The 0600 UTC SPC convective outlook is used in the dataset for day 2 period, and the 0730 UTC outlook is used for the day 3 period. For the day 4 forecasts (in the file titled "csu_mlp_2021_severe_day4.nc"), there are not individual forecasts generated for tornadoes, wind, and hail by the CSU-MLP system; only one set of probabilities for "any severe" hazard (i.e., tornado, wind or hail) are generated. SPC also does not issue forecasts for individual severe hazards at this lead time (only one forecast for "any severe" hazard). The variable names for the CSU-MLP forecasts and SPC forecasts at this lead time are 'csu_mlp_2021_severe_day4' and 'day4otlk_netcdf_prob_fine_single_only' respectively. Gridded reports for tornadoes, wind, and hail are still included as separate variables with the same variable names as the day 2 forecasts. The forecast file containing the day 3 CSU-MLP forecasts for the individual tornado, wind, and hail forecasts ("csu_mlp_2021_tor_day3.nc") does not contain fields for SPC forecasts, as there are no SPC probabilities issued for individual hazards at this lead time (only "any severe"). The day 3 SPC outlooks for "any severe" can be found in the file titled "csu_mlp_2021_severe_day3.nc". This file also contains CSU-MLP forecasts for "any severe" hazard (variable name 'csu_mlp_2021_severe_day3', not analyzed in this study). ## Tree Interpreter Feature Contribution files There are 7 netCDF files containing the feature contributions for the CSU-MLP machine learning-based forecasts. There is one file for each the contributions corresponding to each forecast hazard and lead time used in the study: * day2h_TIcontributions_2021_to_2022.nc * day2t_TIcontributions_2021_to_2022.nc * day2w_TIcontributions_2021_to_2022.nc * day3h_TIcontributions_2021_to_2022.nc * day3t_TIcontributions_2021_to_2022.nc * day3w_TIcontributions_2021_to_2022.nc * day4_TIcontributions_2021_to_2022.nc The feature contribution files are structured in the same way for all forecast hazard types and lead times. The general structure is as follows: ### dimensions: lat = 56 ; lon = 139 ; vars = 15 ; (note this dimension is 12 in the feature contributions for the day 4 forecasts only) hours = 9 ; init_date = 727 ; ### coordinates: float lat(lat) ; float lon(lon) ; object vars(vars); int hours(hours) ; int cats() ; datetime64 init_date(init_date) ; init_date: units = hours ; init_date: format = '%Y-%m-%d%H' ; init_date: timezone = 'UTC' ; init_date: calendar = "gregorian" ; ### variables: float contributions(init_date, lat, lon, vars, hours) ; contributions:grid_type = "Latitude/longitude" ; contributions:initial_time = "01/01/2021 (00:00)" ; contributions:first_init = "20210101" ; contributions:valid_day_end = "20221231" ; ### Feature Contributions Data Notes All date/times (init_date coordinate) represent the initialization date/time for the 24-h forecast that the feature contributions are associated with. For example, the feature contributions data for the init_date '2021-03-20 00:00' in the file would correspond to a day 2 CSU-MLP forecast that is initialized with data from 2021-03-20 00:00 UTC and is valid between 2021-03-21 12:00 UTC to 2021-03-22 12:00 UTC. There are missing feature contributions data for the following dates: 2021-02-04, 2021-03-04, and 2021-12-12. The coordinate "vars" is short for variable. These are the environmental variables that are considered in the CSU-MLP machine learning-based forecasts. Variable names are abbreviated in the dataset, and the full description of each variable can be found in Table 1 of the manuscript associated with this dataset. The coordinate "hours" represent feature contributions at 3-hour timestamps within the 24-h forecast period. For example, for a day 2 forecast, the feature contributions at hour "0" would correspond to forecast hour 36 (which would be 1200 UTC at the start of the forecast period), hour "1" would correspond to forecast hour 39 (1500 UTC), hour "2" would correspond to forecast hour 42 (1800 UTC)... to hour "8", which would correspond to forecast hour 60 (1200 UTC at the end of the forecast period). The coordinate "cats" is short for categories. This is a placeholder variable that is an artifact of the dataset being parsed down for the sake of reducing filesize. The CSU-MLP model system makes three types of forecasts for each hazard/lead time: 0=no severe, 1=severe, and 2=significant severe. Only the feature contributions for the "severe" forecasts (category 1) are included in these files. Feature contributions associated with forecasts of the no severe and significant severe categories can be provided upon request. ## Sharing/Access information Data was derived from the following sources: * SPC outlooks are available via a public archive at [https://www.spc.noaa.gov/.](https://www.spc.noaa.gov/.Severe) * Severe weather reports are available from the Severe Weather Database at [https://www.spc.noaa.gov/wcm/](https://www.spc.noaa.gov/wcm/)
创建时间:
2024-05-09
二维码
社区交流群
二维码
科研交流群
商业服务