Assessing the Impact of Biased Target Variables on Machine Learning Models of Severe Hail Weather and Forecasting
收藏NOAA Institutional Repository2025-11-14 更新2026-04-25 收录
下载链接:
https://doi.org/10.1175/WAF-D-24-0051.1
下载链接
链接失效反馈官方服务:
资源简介:
This study examines the implications of using traditional local storm reports (LSRs) versus radar-derived Multi-Radar Multi-Sensor (MRMS) system maximum estimated size of hail (MESH) as classification target variables for training and evaluating machine learning (ML) models to predict severe hail events. Using input data from the NSSL Warn-on-Forecast System (WoFS), we explore how the LSR and MESH severe hail climatologies compare in WoFS and the variation in model performance with the choices of target variable for training and testing. Regardless of the training target variable, all ML models performed better when evaluated on MESH. The improved performance of the LSR-trained model on MESH was attributed to MESH better capturing nighttime events, which reduced spurious false alarms compared to evaluating LSRs only. However, the best model for a given target variable was the one trained on that target variable. For example, when evaluating LSRs, the LSR-trained model performed best. This has operational significance as MESH-trained models may underperform LSR-trained models if the target variable is LSRs. We attribute the better MESH scores to MESH being more spatially and temporally consistent with WoFS versus LSRs. Nevertheless, whether either approach better predicts severe hail occurrence is still to be determined. Last, combining MESH and LSRs did not significantly improve model performance, which may be attributed to the fact that both datasets have unique error sources that do not cancel out. Ultimately, the main goal of this study is to shed light on the broader implications of data choice in the training and verification of ML models. Grant no. NA2OAR4320204 Grant no. NA22OAR4590171
提供机构:
NOAA
创建时间:
2025-11-14



