Interpretable machine learning for analysing heterogeneous drivers of geographic events in space-time

Figshare2021-07-27 更新2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/Interpretable_machine_learning_for_analysing_heterogeneous_drivers_of_geographic_events_in_space-time/14152016

下载链接

链接失效反馈

官方服务：

资源简介：

ABSTRACT Machine learning (ML) interpretability has become increasingly crucial for identifying accurate and relevant structural relationships between spatial events and factors that explain them. Methodologically aspatial ML algorithms with an apparent high predictive power ignore non-stationary domain relationships in spatio-temporal data (e.g., dependence, heterogeneity), leading to incorrect interpretations and poor management decisions. This study addresses this critical methodological issue of ‘interpretability’ in ML-based modeling of structural relationships using the example of heterogeneous drivers of wildfires across the United States. Specifically, we present and evaluate a spatio-temporally interpretable random forest (iST-RF) that uses spatio-temporal sampling-based training and weighted prediction. Although the ultimate scientific objective is to derive interpretation in space-time, experiments show that iST-RF can improve predictive accuracy (76%) compared to the aspatial RF approach (70%), while enhancing interpretations of the trained model’s spatio-temporal relevance for its ensemble prediction. This novel approach can help balance prediction and interpretation with fidelity in a spatial data science life cycle. However, challenges exist for predictive modeling when dataset is very small, because in such cases locally optimized sub-model’s prediction performance can be suboptimal. With that caveat, our proposed approach is an ideal choice for identifying drivers of spatio-temporal events at country or regional-scale studies. Author contributions A.M. conceived and designed the study, coded and performed data processing, modeling and interpretations, and wrote the manuscript. M.Y., P.M., D.P., and A.T. contributed to the refinement of the proposed methodology, experiments, and write-up. All authors reviewed the manuscript.

摘要机器学习（ML）可解释性对于精准识别空间事件（spatial events）与其解释因子间的准确且相关的结构关系愈发关键。传统非空间机器学习算法（aspatial ML algorithms）虽具备较高的预测性能，却忽略了时空数据（spatio-temporal data）中存在的非平稳域关系（如相关性、异质性），进而导致错误的解释与欠佳的管理决策。本研究以美国境内野火（wildfires）的异质性驱动因子为例，针对基于机器学习的结构关系建模中“可解释性”这一关键方法论痛点展开研究。具体而言，本文提出并评估了一种基于时空采样训练与加权预测的时空可解释随机森林（iST-RF）。尽管最终科学目标是获取时空维度的解释，但实验结果表明，相较于非空间随机森林（RF）方法（预测准确率70%），iST-RF可将预测准确率提升至76%，同时增强了训练模型针对其集成预测（ensemble prediction）的时空相关性解释能力。该创新方法有助于在空间数据科学（spatial data science）生命周期中兼顾预测与解释的保真度。不过，当数据集规模极小时，预测建模仍存在挑战：此时局部优化的子模型预测性能可能欠佳。尽管存在这一局限，我们提出的方法仍是在国家或区域尺度研究中识别时空事件驱动因子的理想选择。作者贡献 A.M. 构思并设计了本研究，完成了代码编写、数据处理、建模与解释工作，并撰写了手稿。M.Y.、P.M.、D.P.与A.T. 对所提方法的优化、实验开展与文稿撰写提供了贡献。全体作者均审阅了本文手稿。

创建时间：

2021-07-27