Groundwater level simulations using wavelet transform-assisted deep learning models
Abstract
The development of groundwater levels (GWL) simulations, based on deep learning (DL) models, is gaining traction due to their success in a wide range of hydrological applications. GWL Simulations allow generating reconstructions to be used for exploring past temporal variability of groundwater resources or provide means to generate projections under climate change on decadal scales. Owing to the diversity of large-scale and local scale forcing factors involved in explaining GWL variability, machine learning or even deep learning approaches reveal relevant tools to simulate GWL. In addition, such methods do not require too much-extended knowledge of physical variables in the links between climate variables and GWL.
In this paper, we investigated the capacities of three deep learning models (Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU) and Bidirectional LSTM (BLSTM)) to reproduce GWL variations over time. Among the three deep learning models, GRU performed relatively better in most cases. Another aspect was to evaluate the input data’s impact and usefulness of wavelet pre-processing considering its limitations and best practices. Two different input datasets are compared to each other, one considering Effective Precipitation only, the other considering Precipitation and Temperature.
Maximum Overlap Discrete Wavelet Transform (MODWT) preprocessing was used to decompose the input variables to explore the impact of wavelet transform in improving the simulations on several types of GWL time series by unravelling “hidden” though useful information in input data. Results show that the preprocessing (MODWT) helps the models generate better simulations. This improvement is higher with raw climate data (precipitation & temperature) as compared to when effective precipitation was used as input. Finally, the Shapley Additive exPlanations (SHAP) approach was used to interpret the impact of input variables on the model simulations. Analysis of SHAP values indicated that the sources of the information content preferentially learned by the models to achieve best simulations. For instance, it was clear that simulation of inertial and mixed GWL required the models to learn from low-frequency variability presented in the input data.