Improving Discharge Predictions in Ungauged Basins: Harnessing the Power
of Disaggregated Data Modeling and Machine Learning
Abstract
Current machine learning methods for discharge prediction often employ
aggregated basin-wide hydrometeorological data (lumped modeling) for
parametric and non-parametric training. This approach may overlook the
spatial heterogeneity of river systems and their impact on discharge
patterns. We hypothesize that integrating temporal-spatial hydrologic
knowledge into the data modeling process (distributed/disaggregated
modeling) can improve the performance of discharge prediction models. To
test this hypothesis, we designed experiments comparing the performance
of identical Long Short-Term Memory Recurrent Neural Network (LSTM-RNN)
models forced with either lumped or distributed features. We gather
meteorological forcing and static attributes for the Mackenzie basin in
Canada- a large and unique basin. Importantly, discharge performance is
assessed out-of-sample with k-fold replication across gauges. Results
reveal a 9.6% improvement in the mean Nash-Sutcliffe Efficiency (NSE)
and a 4.6% improvement in mean Kling-Gupta Efficiency (KGE) when LSTMs
are trained with distributed information. Notably, the models exhibit
consistently unbiased predictions, with a negligible relative bias
(RBias ≈ 0.0) across all predictions. These experiments and results
demonstrate the importance of integrating topologically guided
geomorphologic and hydrologic information (distributed modeling) in
data-driven discharge predictions.