After validation of the model framework on these case studies (see section 3.1.1), the next step was to test the model with synthetic hydrographs (necessary as the inflows to most lakes are ungauged) in order to produce water level frequency curves.
The model results from the synthetic hydrographs had to be validated against observed water level fluctuations in the lakes (section 3.1.3).Observed values were derived from a subset of water level measuring gauging stations with time records longer than 25 years in Quebec (33 stations) and the physically-based model was then tested on the gauges that also had synthetic discharge values available (31 stations). The maximum annual fluctuations were initially derived as a difference between the recorded water levels and the mean at the corresponding station. However, because water level gauges are not available for most lakes, the final testing phase used water surface elevation derived from LiDAR as the baseline elevation to which water level increases were applied (on a subset of 23 stations at which all the necessary information was available). For those lakes with gauges, analysis shows that the average error between recorded mean water level and LiDAR was approximately 0.50 m and the median error was about 0.25 m (see Table 1 in supplementary information). A large portion of this error is driven by a small number of reservoirs that are likely to be affected by a strong seasonal regulation. Removing these stations from the analysis would significantly reduce the average difference between LiDAR and mean water level to a mean error of 0.25 m, but would also not be representative of an error affecting a non-negligible portion of lakes. Since the available LiDAR imagery is constantly increasing and will represent the main source to derive water level data at larger scale, the decision was to keep using the LiDAR elevation as a reference. The values for each lake were fitted with an appropriate distribution to extract values at different return periods (20, 100 and 350 years).
The inputs required to run the model for each lake are the inflow discharge, the lake area and the outflow channel width. The discharge was derived from the distributed hydrological model HYDROTEL (Fortin et al. 1995; 2001) for three return periods of interest while the channel widths were manually measured in QGIS for the different lakes in question. The time to concentration of the inflow hydrographs was set to a fixed value of 200 hours after performing a sensitivity analysis on the model.
Statistical model
In contrast to a physically based methodology, a statistical approach focuses on analysing the water level fluctuations at the available gauging stations across the region in order to identify the driving factors that determine the nature of water level increases. This is done by analysing the recorded time series with a probabilistic distribution and linking the results with observable characteristics of the lakes, in order to identify a statistical model that can be used at ungauged locations. The analysis focused on finding plausible linear regressions that could link the water level increases to different variables, such as lake area, upstream drainage area and peak discharge. To explore all the different possibilities the analysis was assessed in three steps: single variable regression analysis, multivariable regression analysis, and multivariable regression analysis with variable transformation. Several interaction terms were considered, in order to identify a statistically significant relationship.