loading page

Uncertainty Quantification of Machine Learning Models to Improve Streamflow Prediction Under Changing Climate and Environmental Conditions
  • +2
  • Dan Lu,
  • Siyan Liu,
  • Scott L Painter,
  • Natalie A Griffiths,
  • Eric M Pierce
Dan Lu
Oak Ridge National Laboratory (DOE)

Corresponding Author:[email protected]

Author Profile
Siyan Liu
Oak Ridge National Laboratory
Author Profile
Scott L Painter
Oak Ridge National Laboratory, USA
Author Profile
Natalie A Griffiths
Oak Ridge National Laboratory (DOE)
Author Profile
Eric M Pierce
Oak Ridge National Laboratory (DOE)
Author Profile

Abstract

Machine learning (ML) models, and Long Short-Term Memory (LSTM) networks in particular, have demonstrated remarkable performance in streamflow prediction and are increasingly being used by the hydrological research community. However, most of these applications do not include uncertainty quantification (UQ). ML models are data driven and may suffer from large extrapolation errors when applied to changing climate/environmental conditions. UQ is required to ensure model trustworthiness, improve understanding of data limits and model deficiencies, and avoid overconfident predictions in extrapolation. Here, we propose a novel UQ method, called PI3NN, to quantify prediction uncertainty of ML models and integrate the method with LSTM networks for streamflow prediction. PI3NN calculates Prediction Intervals by training 3 Neural Networks and uses root-finding methods to determine the interval precisely. Additionally, PI3NN can identify out-of-distribution (OOD) data in a nonstationary condition to avoid overconfident prediction. We apply the proposed PI3NN-LSTM method in both the snow-dominant East River Watershed in the western US and the rain-driven Walker Branch Watershed in the southeastern US. Results indicate that for the prediction data (which have similar features as the training data), PI3NN precisely quantifies the prediction uncertainty with the desired confidence level; and for the OOD data where the LSTM network fails to make accurate predictions, PI3NN produces a reasonably large uncertainty bound indicating the untrustworthy result to avoid overconfidence. PI3NN is computationally efficient, reliable in training, and generalizable to various network structures and data with no distributional assumptions. It can be broadly applied in ML-based hydrological simulations for credible prediction.