Figure 5 . Boxplots of mean KGE for the evaluation of multiple
variables with different calibration strategies. (I) Evaluation for the
period of calibration (2009 – 2012); (II) Evaluation for a different
period than calibration (2006 – 2008 for Q, A, TWS, ET; 2013 – 2014
for h and W). “Initial guess” refers to model runs with the a priori
parameter sets. (a) Single-variable (discharge, water level, flood
extent, TWS, vegetation ET, soil moisture) and (b) multi-variable
calibration (all except discharge, water level + soil moisture). The
spread of the values in the boxplots stems from 300 model runs (100 for
each of three calibration experiments). Numbers next to the boxplots
represent Skill Score (%). Colors refer to classes of skill score.
Please note that the KGE scales are different for each variable.
Asterisks refer to cases when the evaluation period resulted in a
different performance than the calibration period (i.e., positive Skill
Score in calibration followed by negative Skill Score in evaluation, or
vice-versa). Please note that Skill Score values are computed based on
mean values, while the boxplots depict median values.
How does RS-based model calibration improve the water
cycle
representation?
When performing a single-variable calibration, the performance of the
variable itself always improves, which is evidenced by the positive
values in the main diagonal (Figure 5-I-a, for calibration period, and
Figure 5-II-a, for evaluation period). Calibration with water level was
also able to improve estimates of flood extent, TWS, ET and soil
moisture (cal period), and all variables (eval period). Calibration with
flood extent improved water level, TWS, ET and soil moisture.
Calibration with TWS improved all variables. Calibration with ET was
able to improve discharge and flood extent. Calibration with soil
moisture improved all variables but ET. Results for calibration and
evaluation periods agree (i.e., improvement (positive Skill Score) or
deterioration (negative Skill Score) for both cal and eval) in 43 out of
the 48 cases (89.6%). In the five remaining cases (10.4%), results
between calibration and evaluation periods differ: three of them are in
the evaluation with TWS, and two of them are in the discharge evaluation
(calibration with water level and flood extent).
In the best modeling scenario, calibration with any variable should
improve the performance of all other variables. However, we have
identified that this did not happen in our experiments. This can be due
to uncertainties in model structure, in parameterization, or in the
observations. Previous studies have also found significant advantages in
using RS-based model calibration to identify structural model issues
(e.g., Werth et al., 2009; Willem Vervoort et al., 2014; Winsemius et
al., 2008), detect uncertainties in input data (e.g., Milzow et al.,
2011), identify deficiencies in model parameterization (e.g., Franks et
al., 1998; Koppa et al., 2019), or increase model reliability (e.g.,
Koch et al., 2018; Manfreda et al., 2018).
According to Figure 4b and supporting information (Figure S1),
calibration with discharge improved estimates of almost all variables.
However, calibration with discharge deteriorated the performance for
vegetation ET time series. Vegetation ET estimated by MOD16 varies at
maximum 30mm/month. MGB calibration with discharge led to ET variations
of 100 mm/month, reaching around 30 mm/month in the driest periods,
while MOD16 estimates are limited to a minimum of 100 mm/month in these
periods (time series in Figure 4b). However, one can notice that not
even the seasonality between MGB and MOD16 time series agree. This could
be due to relatively high uncertainties in vegetation ET estimates from
MOD16 for the Amazon basin (around 23 mm/month, according to
Gomis-Cebolla et al., 2019). Nonetheless, it could also be related to
model structural and/or parameter deficiencies, in which case the model
might be “right for the wrong reasons”. In order to identify the
source of this ET inconsistency, we have compared MOD16 and MGB results
to in-situ measurements of ET in Purus River Basin, provided by
Gomis-Cebolla et al. (2019) and Maeda et al. (2017). We found a much
stronger agreement both in seasonality and in amplitude of in-situ
observations with MOD16 observations than with MGB model output. Hasler
& Avissar (2007) and Pan et al (2020) have already warned about the
overestimation of dry season water stress in hydrological models,
probably related to the misrepresentation of soil water availability for
plants. This was also found by Maeda et al. (2017), which highlighted
that ET was not water-limited because of the plants’ access to deep soil
water, which has also been previously documented by Nepstad et al.
(1994). They found that, in the Southern Amazon ecotone, deep root water
intake plays a key role in maintaining ecosystem productivity during dry
season. MGB model is probably misrepresenting these processes, which
would remain unknown if it were only compared to discharge time series.
Even though the calibration with discharge observations was not able to
accurately estimate ET, calibration with the remaining variables (except
for soil moisture) was able to improve ET estimates. For instance, in
Figure 3b, ET and water level presented low correlation (r = 0.08), but
calibration with water level improved ET estimates by S = 16.9% (cal
period) and S = 25.6% (eval period). However, in Figure 3b, ET and TWS
presented high correlation (r=0.47), but calibration with TWS improved
ET estimates by only S = 7.9% (cal period) and S = 13.1% (eval
period).
In general, calibration with TWS did not present much influence on any
of the variables. In spite of some improvements, skill scores were
usually low. Consistently, TWS estimates got relatively easily improved
by calibration with any variable (except for ET, for cal period; or
discharge, for eval period). These results for TWS contrast with
previous work from Lo et al. (2010), Nijzink et al. (2018), Rakovec et
al. (2016), Schumacher et al. (2018), and Werth & Güntner (2010), which
highlighted the value of GRACE data when incorporated into hydrological
modeling. This can be due to the high seasonality of Purus River Basin,
in which TWS does not aggregate much information, biasing the
calibration with high correlation values. Even for the initial guess
(uncalibrated) setup TWS performances were already very good: KGE values
were around 0.8, while for all other variables, except for ET (for which
KGE values were negative), KGE values were around 0.3 for the
uncalibrated setup.
Flood extent and water level performances were improved by calibration
with discharge, water level and flood extent, but it did not affect much
ET (which actually was degraded with discharge calibration) and soil
moisture. This is probably due to the relationship between water level
and flood extent with river transport processes (e.g., flood routing and
floodplain storage), while ET and soil moisture are more related to
vertical hydrological processes (e.g., soil water balance). This
highlights the complementarity between variables that relate to
different processes.
Calibration with soil moisture
improves performances of all variables (water level to a lesser extent),
except for ET. Consistently, calibration with all variables (except ET)
are able to improve soil moisture to some extent.
What is the added value of complementary RS
observations?
By calibrating with all variables together except Q (Figure 5b), we
found improvements for almost all variables, with the most significant
improvements for flood extent ( S = 25% for cal and eval periods) and
ET (S = 20% for cal and eval periods). For discharge, performance for
the evaluation period was improved (S = 17.4%), which is important for
estimating discharge in poorly gauged basins. However, for the
calibration period, Skill Score for discharge performance was S = 1.7%,
which might reflect some limitations in retrieving discharge based on
the calibration of the RS-derived variables, as discussed previously.
Therefore, we chose a specific arrangement of two complementary
variables in order to check if this calibration setup might lead to
better retrievals for discharge and the other variables. The chosen
variables were soil moisture and water level, because of their
complementarity. Based on the Skill Score values in Figure 5-I,
calibration with water level only improves all variables but discharge
(and soil moisture to a lesser extent), while calibration with soil
moisture only improves all variables, but ET (and water level to a
lesser extent).
The calibration arrangement of water level and soil moisture led to
improvements not only to soil moisture and water level themselves, but
also to all other variables (ET to a lesser extent). For instance, flood
extent was improved by S = 52.6% and S = 34.1% (cal and eval period,
respectively). Discharge was improved by S = 59.9%, with a resulting
mean KGE = 0.70 for the calibration period (S = 45.0% and mean KGE =
0.35 for evaluation period). These results agree with previous works
that found an improvement in model performances by multi-variable
calibration of soil moisture and evapotranspiration (e.g., Koppa et al.,
2019; López et al., 2017), discharge and evapotranspiration (e.g.,
Herman et al., 2018; Pan et al., 2018; Poméon et al., 2018), discharge
and soil moisture (e.g., Li et al., 2018; Rajib et al., 2016), discharge
and TWS (e.g., Rakovec et al., 2016; Schumacher et al., 2018; Werth &
Güntner, 2010), and discharge and water level (e.g., Kittel et al.,
2018; Schneider et al., 2017; W. Sun et al., 2012). However, it is
difficult to compare this study to previous works, because most of them
used discharge observations as constraints. In this study, we avoided
the use of discharge observations for multi-variable calibration, in
order to analyze the applicability of the RS-based calibration method
for poorly-gauged regions.
Calibration with water level and soil moisture did not present much
influence on ET performance, because of the specificities regarding ET
in this watershed, i.e., given that the model setup does not represent
deep root water intake during dry season, as discussed previously.
By comparing the two frameworks for multi-variable calibration (all
except Q versus h+W calibration), we found that calibration with all
variables except Q is useful to some extent, but consistently selecting
complementary variables for model calibration resulted in best overall
performance.
Are we getting the right results for the right sets of
parameters?
When analyzing the dispersions of parameters before and after
calibration with each variable (Figure 6 for a few selected parameters,
Supporting Information Figure S2 for all calibrated parameters), it can
be observed that the range of parameters vary largely depending on the
calibration variable. For instance, Wm is a soil conceptual parameter
related to maximum water storage in the soil. In the calibration based
on single variables (except ET) it converged to low values (300 mm),
while in the calibration with ET it reached high values (2000 mm). This
probably occurred in order to compensate, by overparameterization, a
structural error in the model, i.e., the model inability to represent
deep root water uptake in dry season. These trade-offs between model
parameters during calibration has also been reported and discussed by
Koppa et al. (2019).
The surface resistance parameter also resulted in a wide range of values
depending on the calibration target variable. When calibrated with water
level, flood extent, or ‘all except Q’ experiments, it reached median
values higher than 150 s/m, but calibration with h+W led to median
values lower than 50 s/m. Surface resistance is a vegetation parameter
directly related to ET dynamics, so it is important to note that
calibration with ET was able to reduce the dispersion of this parameter,
reaching a median value of about 80 s/m (similar to calibration with Q
and W).
Another interesting result relates to channel Manning’s coefficient,
which presented different values for each calibration experiment. This
agrees with previous findings about Manning parameter being often used
as an effective parameter that compensates for neglected hydrodynamic
processes as localized channel head losses, poor cross section
representation, or non-represented 2D processes (Neal et al 2015).
Many previous studies have highlighted the use of multi-variable
calibration to narrow parameters’ search space (Nijzink et al., 2018; W.
Sun et al., 2018), but this was not observed in our results. Based on
the limited multi-variable calibration experiments performed here (‘all
except Q’ and h+W), no narrowing in parameters’ search space was found.
For most parameters (except for Wm), calibration with ‘all except Q’ and
h+W resulted in a wide range of values. This can be due to differing
convergence sets of parameters between each of the triplicate runs. A
more robust experiment comparing more multi-variable calibration
strategies (e.g., Q + different R-based variables) might provide better
understanding on this topic.