Machine learning in coupled wildfire-water supply risk assessment: Data
science toolkit
Abstract
The frontier of wildfire-related risk assessment is moving into data
science territory, and with good reason. Computational statistics, built
on a foundation of high resolution remote sensing data, ground data, and
theory, forms the basis of powerful risk assessment tools. The need for
data based risk assessment has increased in past years, in view of
longer wildfire seasons in the U.S., associated with more frequent
droughts, more human ignitions and accumulating fuel loads. We present
an application of machine learning (ML), which makes it possible to
analyze complex data without a priori definition of interactions—this
is a major advantage because these interactions are not known
beforehand. Specifically, we build a stochastic gradient boosting
machine (GBM) toolkit to assess the change in river flow after wildfire
in the contiguous United States (CONUS) over a 5-year period. The GBM
accounts for nonlinear relationships and interactions between wildland
fire characteristics, watershed geometry, climate variability,
topography and land cover. Building the GBM is a sequential process
where a loss function is minimized at each fold, along a gradient
defined by pseudo-residuals. This process allows the program to
progressively learn more about how the variables in the large dataset
interact to result in the response (i.e., river flow). Our results show
that wildfires increase annual river flow in the CONUS when more than
20% of a gaged basin is burned. Data science tools like the GBM
presented here, are essential in generating practical knowledge on how
wildfire impacts on ecohydrology can ultimately affect hydrological
services, socio-hydrosystems and water security in fire-affected
regions.