five

Dataset associated with "Forecasting Excessive Rainfall with Random Forests and a Deterministic Convection-Allowing Model"

收藏
Mendeley Data2024-01-31 更新2024-06-29 收录
下载链接:
https://mountainscholar.org/handle/10217/233672
下载链接
链接失效反馈
官方服务:
资源简介:
Approximately seven years of daily initializations from the convection-allowing National Severe Storms Laboratory Weather Research and Forecasting Model are used as inputs to train random forest (RF) machine learning models to probabilistically predict instances of excessive rainfall. Unlike other hazards, excessive rainfall does not have an accepted definition, so multiple definitions of excessive rainfall and flash flooding—including flash flood reports and 24-h average recurrence intervals (ARIs)—are used to explore RF configuration forecast sensitivities. RF forecasts are analogous to operational Weather Prediction Center (WPC) day-1 Excessive Rainfall Outlooks (EROs) and their resolution, reliability, and skill are strongly influenced by rainfall definitions and how inputs are assembled for training. Models trained with 1-yr ARI exceedances defined by the Stage-IV (ST4) precipitation analysis perform poorly in the northern Great Plains and Southwest United States, in part due to a high bias in the number of training events in these regions. Increasing the ARI threshold to 2 years or removing ST4 data from training, optimizing forecast skill geographically, and spatially averaging meteorological inputs for training generally results in improved CONUS-wide RF forecast skill. Both EROs and RF forecasts have seasonal skill—–poor forecasts in the late fall and winter and skillful forecasts in the summer and early fall. However, the EROs are consistently and significantly better than their RF counterparts, regardless of RF configuration, particularly in the summer months. The results suggest careful consideration should be made when developing ML-based probabilistic precipitation forecasts with convection-allowing model inputs, and further development is necessary to consider these forecast products for operational implementation.

本研究采用美国国家强风暴实验室(National Severe Storms Laboratory)对流尺度天气预报模式(Weather Research and Forecasting Model, WRF)近7年的逐日初始场数据作为输入,训练随机森林(Random Forest, RF)机器学习模型,以概率化预测极端降雨事件。与其他气象灾害不同,极端降雨尚无公认定义,因此本研究采用多种极端降雨与山洪暴发的定义——包括山洪观测报告及24小时平均重现期(24-h average recurrence intervals, ARIs)——以探究随机森林模型配置对预报敏感性的影响。随机森林预报结果与美国天气预测中心(Weather Prediction Center, WPC)发布的第1天极端降雨展望(Day-1 Excessive Rainfall Outlooks, EROs)具有相似性,其分辨率、可靠性与预报技巧在很大程度上受降雨定义及训练输入数据组装方式的影响。采用基于第四阶段(Stage-IV, ST4)降水分析定义的1年重现期阈值训练的模型,在美国北部大平原与美国西南部地区预报效果较差,部分原因是上述区域的训练事件数量存在偏高偏差。将重现期阈值提升至2年,或从训练集中移除ST4数据,可优化预报技巧的地理分布;对训练用气象输入场进行空间平均,通常也能提升美国本土(Continental United States, CONUS)范围内的随机森林预报技巧。极端降雨展望与随机森林预报均存在季节性预报技巧差异——晚秋与冬季预报效果较差,夏季与早秋预报能力较强。但无论随机森林模型配置如何,美国天气预测中心的极端降雨展望始终显著优于对应的随机森林预报,在夏季尤为明显。研究结果表明,采用对流尺度模式输入开发基于机器学习(Machine Learning, ML)的概率化降水预报时,需谨慎考量相关设置;若要将此类预报产品投入业务化应用,仍需开展进一步研发工作。
创建时间:
2024-01-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作