Dan Lu

and 4 more

Machine learning (ML) models, and Long Short-Term Memory (LSTM) networks in particular, have demonstrated remarkable performance in streamflow prediction and are increasingly being used by the hydrological research community. However, most of these applications do not include uncertainty quantification (UQ). ML models are data driven and may suffer from large extrapolation errors when applied to changing climate/environmental conditions. UQ is required to ensure model trustworthiness, improve understanding of data limits and model deficiencies, and avoid overconfident predictions in extrapolation. Here, we propose a novel UQ method, called PI3NN, to quantify prediction uncertainty of ML models and integrate the method with LSTM networks for streamflow prediction. PI3NN calculates Prediction Intervals by training 3 Neural Networks and uses root-finding methods to determine the interval precisely. Additionally, PI3NN can identify out-of-distribution (OOD) data in a nonstationary condition to avoid overconfident prediction. We apply the proposed PI3NN-LSTM method in both the snow-dominant East River Watershed in the western US and the rain-driven Walker Branch Watershed in the southeastern US. Results indicate that for the prediction data (which have similar features as the training data), PI3NN precisely quantifies the prediction uncertainty with the desired confidence level; and for the OOD data where the LSTM network fails to make accurate predictions, PI3NN produces a reasonably large uncertainty bound indicating the untrustworthy result to avoid overconfidence. PI3NN is computationally efficient, reliable in training, and generalizable to various network structures and data with no distributional assumptions. It can be broadly applied in ML-based hydrological simulations for credible prediction.