Prediction of multi-sectoral longitudinal water withdrawals using
hierarchical machine learning models
Abstract
Accurate models of water withdrawal are crucial in anticipating the
potential water use impacts of drought and climate change.
Machine-learning methods are increasingly used in water withdrawal
prediction due to their ability to model the complex, nonlinear
relationship between water use and potential explanatory factors.
However, most machine learning methods do not explicitly address the
hierarchical nature of water use data, where multiple observations are
typically available for multiple facilities, and these facilities can be
grouped an organized in a variety of different ways. This work presents
a novel approach for prediction of water withdrawals across multiple
usage sectors using an ensemble of models fit at different hierarchical
levels. A dataset of over 300,000 records of water withdrawal was used
to fit models at the facility and sectoral grouping levels, as well as
across facility clusters defined by temporal water use characteristics.
Using repeated holdout cross validation, it demonstrates that ensemble
predictions based on models learned from different data groupings
improve withdrawal predictions for 63% of facilities relative to
facility-level models. The relative improvement gained by ensemble
modeling was greatest for facilities with fewer observations and higher
variance, indicating its potential value in predicting withdrawal for
facilities with relatively short data records or data quality issues.
Inspection of the ensemble weights indicated that cluster level weights
were often higher than sector level weights, pointing towards the value
of learning from the behavior of facilities with similar water use
patterns, even if they are in a different sector.