3.2.3 Multivariable regression analysis applying variable
transformations
Two transformations were then applied to the predictors (logarithmic and
square root), to identify any linear correlation between the water level
above the LiDAR level and the transformed variables. The predictors
considered by this analysis were lake area, upstream drainage area,
outflow channel width and peak discharge with return period of 350
years. In this context it was observed that squaring the discharge
values made a relevant difference in the predictive performance of the
model, while performing the same transformation on the area and
watershed areas was not as significant. The weir value was not
identified as a relevant variable. The simplest significant model used
as predictors the lake area and the square root of the peak discharge,
and shows a RMSE of 0.48 m with an adjusted R squared of 0.5495. The
p-values for both the predictors are in the order of
10-5, showing a statistically significant correlation
(Table 3, Figure 4).
[Table 3]
[Figure 4]
The error characteristics of this approach show the assumption of
homoscedasticity to be valid in this model. Neither the error nor the
absolute value of the error are shown to be linked with an increase of
any of the predictors or other lake characteristics (plot in Figure 3 of
supplementary materials).
An identical multivariable regression procedure was applied using return
periods of 100 and 20 years. The results show that an equivalent model,
with slightly modified coefficients, provides a good performance for a
100 year flow (RMSE = 0.48 m, adjusted R squared of 0.4849). However,
the adjusted R squared dropped to 0.3574 for the 20-year events,
indicating that the model was not able to predict water levels
associated with more frequent events. We hypothesise that this is
because higher frequency events are more readily controlled by
engineered features leading to highly unpredictable water level
behaviour.