Figure 3 . a) Sensitivity analysis of different model output variables to varying sets of parameters (hyd=hydraulics, soil, veg=vegetation, and all together). The a priori dispersion of the model parameters, for each output variable, is compared to the reported uncertainty for the in-situ / RS product estimates, previously described in the Cal/Eval data section (no uncertainty estimation is provided for the soil moisture root zone product given absence of this estimate for the Amazon region). b) Correlation matrix (Pearson coefficient) between performance metrics (KGE) for the six analyzed variables, by varying all parameters together. KGE values are computed by comparing multiple runs with the reference simulation (i.e., the initial run with the initial parameter set as defined in Supporting Information Table S2). Q = discharge, h = water level, A = flood extent, TWS = total water storage anomalies, ET = vegetation evapotranspiration, W = soil moisture.

How do dispersions in model outputs relate to uncertainties in observations?

Some variables present in-situ/RS observations that have uncertainties significantly lower than the overall dispersion of the model, e.g., 25 % for discharge observations, while model overall parameter dispersion is ~160%. This pattern is also found for water level and TWS estimates, and implies that these observations might be useful to constrain the model. Nonetheless, uncertainties in RS products of flood extent (~50%) and vegetation ET (~23%) are in the same order of magnitude of model overall parameter dispersion, which might hamper their contribution for model calibration, due to their high uncertainties.

Which sets of parameters are related to which variables?

The overall model dispersions are related to different sets of parameters: discharge, water level, and TWS are more strongly related to hydraulics and soil parameters, and to a lesser extent to vegetation parameters. Flood extent estimates are strongly related to hydraulic parameters, and less to soil and vegetation. As expected, soil moisture and vegetation ET estimates relate to vertical water balance processes, being insensitive to hydraulic parameters. Soil moisture (W) is more sensitive to soil parameters, while vegetation ET is more sensitive to vegetation parameters. These results are very useful to understand the RS-based calibration experiments addressed in section 3.2. For instance, if model calibration with ET or W is achieved through optimization of hydraulic parameters, it would highlight that the model would have “gotten the right results for the wrong reasons”. The same would occur if flood extent calibration targeted soil or vegetation parameters.

Which variables are inter-related?

By varying all parameters together, there is a high correlation (greater or equal to 0.4) between the performance of discharge and flood extent, water level and flood extent, flood extent and TWS, and ET and TWS (Figure 3b). High correlations between discharge, water level and flood extent are expected because of their strong association through river transport processes. However, correlation between discharge and water level is not too high (0.30), and this is probably due to high uncertainties in hydraulic parameters, and to the large distance separating the water level virtual station and the streamflow gauge. Furthermore, high correlations between TWS and flood extent might be related to surface water storage dynamics which are especially relevant in regions with floodplains.
In general, a high correlation between variables in Figure 3b should be reflected in positive results when calibrating with a given variable and evaluating with the other highly correlated variable (single-variable calibration). This may also indicate that observations of these variables are redundant if used simultaneously in a multi-calibration framework. However, high correlations in Figure 3b followed by deterioration after the single-variable calibration process might indicate structural errors in the model, or in the observations. We stress however that this study did not attempt to quantify structural errors. Conversely, low correlations in Figure 3b, followed by improvement in performances with the calibration with multiple variables, might indicate complementarity between variables.

Model calibration

How RS-based model calibration improves discharge estimates?

For the evaluation time period (2006–2008 for discharge, flood extent, TWS, ET and 2013–2014 for water level and soil moisture), calibration with all RS products led to improvements in discharge estimates (Figure 4a). For the calibration time period (2009-2012), TWS, ET and soil moisture RS products also led to improvements in discharge estimates, while water level and flood extent led to discharge overestimation in wet periods (Figure 4a). This could be due to high uncertainties in the observations (Figure 3a), but if this was the case, it would also be reflected in a poor performance for water level and flood extent when discharge is the target variable for calibration (Figure 4b), which does not occur. Therefore, calibration with discharge leads to reasonable parameter sets for the performance of discharge itself, and also water level and flood extent. However, it does not lead to the best hydraulic arrangement, which might be achieved more successfully when calibrating with water level or flood extent.
Nonetheless, both water level and flood extent observations are representative of a specific location in the basin (Figure 2), and calibration with these variables might lead to the best parameter arrangement for these locations, but not for the whole watershed. A more spatially-consistent use of these observations should improve their usability to constrain models and improve discharge estimates, such as the studies of Kittel et al. (2018), that used radar altimetry measurements at 12 locations in the basin, Schneider et al. (2017), that used data from 13 virtual stations, or Liu et al. (2015), that used water level measurements at four virtual stations, and flood extent for stream segments at different locations in the basin.
RS variables as TWS, ET, and soil moisture were able to improve discharge estimates by S =13.7%, S= 52.9%, and S = 27.0% (Figure 5-I, calibration period) or S = 27.4%, S= 6.1%, S= 12.3% (Figure 5-II, evaluation period), which is especially relevant in the context of the Prediction in Ungauged Basins initiative (Hrachowitz et al., 2013; Sivapalan et al., 2003). These results agree with previous studies, such as López et al. (2017) that found good performances in discharge estimates by model calibration with GLEAM ET and ESA CCI soil moisture, or Nijzink et al. (2018), that found improvements in discharge by using soil moisture products (AMSR-E, ASCAT) and TWS from GRACE.
The multi-variable calibration experiment considering all variables except discharge (Figure 5b) resulted in a Skill Score of S = 17.4% for discharge in the evaluation period. This is relevant for estimating discharge in poorly gauged basins. Nonetheless, for the calibration period, Skill Score had a low value (S = 1.7%), reflecting some limitations when retrieving discharges, probably because of potential trade-offs between variables (Koppa et al., 2019). RS uncertainties could be better incorporated into the calibration, for instance by using bias-insensitive metrics (e.g., Demirel et al., 2018; Zink et al., 2018; Dembele et al., 2020), or explicitly including them into the objective functions (Aires, 2014; Croke, 2009; Foglia et al., 2009; Peña-Arancibia et al., 2015).