Zachary McEachran

and 8 more

We present a knowledge-guided machine learning framework for operational hydrologic forecasting at the catchment scale. Our approach, a Factorized Hierarchical Neural Network (FHNN), has two main components: inverse and forward models. The inverse model uses observed precipitation, temperature, and streamflow data to generate a representation of the current underlying catchment state. The forward model predicts streamflow using the learned catchment state. The FHNN architecture is designed to model multi-scale processes and capture their interactions while providing explainability and interpretability. FHNN also improves forecasts based on real-time data through an inference-based data integration approach. FHNN’s data integration approach improves forecasts in response to observed data more efficiently than data assimilation methods (e.g., ensemble Kalman filtering) that require computationally intensive optimization. Once an inverse model is trained, it can quickly infer catchment states directly based on data in real-time. To show the operational performance of FHNN, we compare the FHNN forecasts with that of an expert human hydrologic forecaster using a physics-based model where both use the same imperfectly known future precipitation forecast in their modeling. The expert human forecaster creates a more accurate forecast within the first 18 hours of a forecast’s issuance, but FHNN has significantly better predictions at longer lead times. Additionally, FHNN internal states correlate strongly with internal physics-based model states, such as soil moisture, in a synthetic case. This research lays the groundwork for leveraging the predictive performance of AI-based models with the expertise in forecasting agencies to produce better river forecasts at all lead times.

Xiang Li

and 11 more

Streamflow prediction is a long-standing hydrologic problem. Development of models for streamflow prediction often requires incorporation of catchment physical descriptors to characterize the associated complex hydrological processes. Across different scales of catchments, these physical descriptors also allow models to extrapolate hydrologic information from one catchment to others, a process referred to as “regionalization”. Recently, in gauged basin scenarios, deep learning models have been shown to achieve state of the art regionalization performance by building a global hydrologic model. These models predict streamflow given catchment physical descriptors and weather forcing data. However, these physical descriptors are by their nature uncertain, sometimes incomplete, or even unavailable in certain cases, which limits the applicability of this approach. In this paper, we show that by assigning a vector of random values as a surrogate for catchment physical descriptors, we can achieve robust regionalization performance under a gauged prediction scenario. Our results show that the deep learning model using our proposed random vector approach achieves a predictive performance comparable to that of the model using actual physical descriptors. The random vector approach yields robust performance under different data sparsity scenarios and deep learning model selections. Furthermore, based on the use of random vectors, high-dimensional characterization improves regionalization performance in gauged basin scenario when physical descriptors are uncertain, or insufficient.