Uncertainty Quantification of Machine Learning Models to Improve
Streamflow Prediction Under Changing Climate and Environmental
Conditions
Abstract
Machine learning (ML) models, and Long Short-Term Memory (LSTM) networks
in particular, have demonstrated remarkable performance in streamflow
prediction and are increasingly being used by the hydrological research
community. However, most of these applications do not include
uncertainty quantification (UQ). ML models are data driven and may
suffer from large extrapolation errors when applied to changing
climate/environmental conditions. UQ is required to ensure model
trustworthiness, improve understanding of data limits and model
deficiencies, and avoid overconfident predictions in extrapolation.
Here, we propose a novel UQ method, called PI3NN, to quantify prediction
uncertainty of ML models and integrate the method with LSTM networks for
streamflow prediction. PI3NN calculates Prediction Intervals by training
3 Neural Networks and uses root-finding methods to determine the
interval precisely. Additionally, PI3NN can identify out-of-distribution
(OOD) data in a nonstationary condition to avoid overconfident
prediction. We apply the proposed PI3NN-LSTM method in both the
snow-dominant East River Watershed in the western US and the rain-driven
Walker Branch Watershed in the southeastern US. Results indicate that
for the prediction data (which have similar features as the training
data), PI3NN precisely quantifies the prediction uncertainty with the
desired confidence level; and for the OOD data where the LSTM network
fails to make accurate predictions, PI3NN produces a reasonably large
uncertainty bound indicating the untrustworthy result to avoid
overconfidence. PI3NN is computationally efficient, reliable in
training, and generalizable to various network structures and data with
no distributional assumptions. It can be broadly applied in ML-based
hydrological simulations for credible prediction.