Abstract
The importance of soil moisture is recognized globally because it
controls hydrological processes that are relevant to agriculture and
climate studies. Currently, estimation of root zone soil moisture is
largely accomplished using physical models, which are based on flow and
transport equations. However, with the complexity of the processes
operating in the vadose zone as well as their interactions with each
other, parameterizing all the relevant processes is quite a challenge.
This complexity is further enhanced by spatio-temporal variability in
soil and vegetation properties which demand model parameters to be
dynamic. Alternatively, purely data-based methods for root zone soil
moisture estimation are still limited despite the growing availability
of datasets from networks established within the last decade. Currently,
these datasets are used largely for calibration and validation of
physical models and retrieval methods from satellites. In this study, we
explored the utility of Random Forest (RF) as an approach for predicting
and forecasting daily root zone soil moisture from selected stations in
the Raam and Twente network. We trained a single RF using meteorological
datasets, soil type, land cover type, and LAI as predictor variables.
The model was also tuned in order to obtain the optimal hyperparameters
(mtry and ntree) and number of training samples. A comparison with model
simulation results using Hydrus-1D was also performed. Our results show
that RF can accurately predict and forecast root zone soil moisture at
the study sites based on RMSE of 0.02 – 0.12 m3m-3, in comparison with
Hydrus-1D simulations having RMSE of 0.05-0.22 m3m-3. However, poor
results were obtained for saturated water conditions. In addition,
5-95% RF prediction intervals become wider at saturated water
conditions for some sites, which indicates higher prediction and
forecast uncertainties. RF can be used for root zone soil moisture
estimation, especially at data poor regions where information on soil
hydraulic parameters are sparse or lacking. It can also be used for
estimating missing values at gaps in time series datasets.