Examining wildfire dynamics using ECOSTRESS data with machine learning approaches: The case of South-Eastern Australia's black summer
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10397273
下载链接
链接失效反馈官方服务:
资源简介:
Our study focuses on the south-eastern region of Australia. In recent years, the south-eastern part has been experiencing increasing frequency of wildfires. However, the 2019-2020 bushfire season was unprecedented in intensity and devastation. It is widely known as ‘Black Summer’.This study combined various biophysical factors, including MODIS MCD64A1 fire product, digital elevation model (DEM), slope, aspect, ECOSTRESS data (i.e., evapotranspiration – ET, evaporative stress index – ESI, land surface temperature – LST, water use efficiency -WUE), NDVI generated from Sentinel-2 data, and rainfall data, for wildfire prediction. To this aim, we designed models that incorporate pre-fire vegetation conditions obtained from ECOSTRESS data to predict the probability of future wildfire occurrence. The predictivity of models and biophysical factors were assessed to understand pre-fire vegetation conditions and wildfire susceptibility.
We used nine variables from four sources as explanatory variables (Table 1). Fire occurrences between the period of September 2019 and March 2020 were obtained from the MODIS MCD64A1 product as a shapefile and mapped. A dataset was created to record the presence and absence of fires, classified as 0 and 1, respectively. Rainfall data were obtained from the Bureau of Meteorology, Australia, for all seven months, which was then compiled and interpolated using the Inverse Distance Weighting (IDW) method. The IDW tool from ArcGIS Spatial Analyst extension is used. DEM derivatives such as slope and aspect were created using Slope and Aspect tools in ArcGIS Pro. Sentinel-2 L2A (16-bit) data was downloaded from the Sentinel Hub EO browser at a resolution of 10 m, and NDVI was mapped using bands 4 and 8. All variable raster images were clipped to extract the study area. ECOSTRESS data products, including Evapotranspiration (ET), Evaporative Stress Index (ESI), Land Surface Temperature (LST), and Water Use Efficiency (WUE), acquired from NASA LPDAAC AppEARS, were used to model wildfire dynamics (Fisher et al., 2020; Zhu et al., 2022). A mosaic dataset in a raster format was created for each variable over the seven months between September 2019 and March 2020.
Table 1. Explanatory variables used in this research and their data sources
Category
Explanatory variables
Source
ECOSTRESS
Evapotranspiration (ET)
70m resolution ECOSTRESS data from LPDAAC AppEARS https://lpdaacsvc.cr.usgs.gov/appeears/
ECOSTRESS
Evaporative stress index (ESI)
70m resolution ECOSTRESS data from LPDAAC AppEARS https://lpdaacsvc.cr.usgs.gov/appeears/
ECOSTRESS
Land surface temperature (LST)
70m resolution ECOSTRESS data from LPDAAC AppEARS https://lpdaacsvc.cr.usgs.gov/appeears/
ECOSTRESS
Water use efficiency (WUE)
70m resolution ECOSTRESS data from LPDAAC AppEARS https://lpdaacsvc.cr.usgs.gov/appeears/
Vegetation Index
Normalized Difference Vegetation Index (NDVI)
SENTINEL-2 Data (10 m resolution, band 4 and 8 is used) https://scihub.copernicus.eu/dhus/#/home
Climate
Rainfall
Bureau of Meteorology, Australia http://www.bom.gov.au/climate/data/
Topography
Elevation
9 arc-second DEM (~250 m resolution) from Geoscience Australia (Hutchinson et al., 2008)
Topography
Slope
Derived from DEM
Topography
Aspect
Derived from DEM
Two categories of models were developed in this study: general models and monthly models. The general models were specifically constructed to estimate wildfire susceptibility and quantify the significance of input biophysical factors over the entire wildfire period, spanning from September 2019 to March 2020. These models utilized the mean values of explanatory variables throughout this period as independent input variables, with the samples collected from MODIS ground fire points during 2019-2020 serving as the dependent variable. The study integrated a range of explanatory variables, including ECOSTRESS data, vegetation indices, climatic parameters, and topographical factors, to quantitatively assess their respective impacts on the prediction of wildfire.
The monthly models were designed to capture pre-fire vegetation conditions and predict wildfire spread one week ahead. We set up a three-week time lag for data collection prior to a wildfire event in the 4th week and predict the probability of wildfire occurrence in the following week (5th week). The mean values of the selected data in three weeks were computed to minimize or eliminate gaps. The model, for example, to predict wildfire occurrence probability in the first week of September (September 1-7), was built using the mean values of explanatory variables during a three-week time from August 1 to August 21. Such a design is to create an effective model to predict wildfire spread and assess the impact of pre-fire plant stress on following wildfire occurrence. The Australian bushfires started to spread in the first week of September 2019 and faded in early April 2020. The fires ceased at the end of October 2019 in south-eastern Australia and reignited in late November 2019. To understand the impact of change in the climate condition of the country after the first fire and to effectively assess the fire influential factor, we built three monthly models to predict (1) the first week of September (the week when the first wildfire started), (2) the last week of November, and (3) the first week of December (the weeks when the second fire started).
Machine Learning is based on algorithms that have the capacity to learn from data and make effective predictions. This learning process involves modeling the hidden relationships between a set of input variables (explanatory variables) and the occurrences of the phenomenon (the dependent variable) (Tonini et al., 2020). we acquired 2037 wildfire occurrence points. Of these, 70% (1426 wildfire occurrence points) were allocated for training, while the remaining 30% (611 wildfire occurrence points) were reserved for validation. Here, we evaluated LR, GWR, and RF algorithms to create models that fit relationships between wildfire events and the explanatory variables. The fit relationships from these models were then used in the susceptibility mapping and assessment of variable influence. Linear Regression (LR), in particular, demands the independence of explanatory variables. To mitigate the impact of the correlation between these variables, we employed a regularization technique using LASSO (L1 regularization). LASSO penalizes the coefficients of correlated variables, prompting the LR model to favor a subset of independent variables and enhance model robustness (Qian et al., 2012). Prior to the application of LR and GWR, we normalized the explanatory variables to a common scale (between 0 and 1) based on their observed maximum and minimum values (Zhu et al., 2022). This normalization ensures equal contributions from all variables. Such scaling facilitates straightforward comparison and interpretation of variable importance.
In addition, to evaluate the accuracy of wildfire susceptibility modeling, pixels were categorized as either fire or non-fire based on a probability threshold value of 0.5. Pixels greater than 0.5 were identified as fire pixels, while those below the threshold value were not considered in the process. A confusion matrix is utilized to evaluate the performance of a classification model that predicts two or more classes. This matrix evaluates the accuracy, sensitivity, and specificity of the model’s outcomes (Parikh et al., 2008).
This dataset covers south-eastern region of Australia during 2019-2020. The dataset includes input explanatory variables of general and monthly models, wildfire susceptibility for each city and fire locations.
创建时间:
2024-01-05



