1.1 Surrogate modelling
Water modellers have resorted to surrogate models (SMs) to replace computationally costly models. Following the classification given by Razavi et al. (2012b), SMs, also known as metamodels or reduced-order models, can be categorized as Lower-fidelity Physically-based surrogates (LFPB) or response surface (RS) surrogates. On one hand, LFPB metamodels modify the original model to reduce its computational effort. These models simplify the original model by lowering the resolution (e.g., larger time-steps) of the output or replacing computationally costly components with faster alternatives or complements (e.g., kriging, linear regression, neural networks (Fernandez et al., 2017)). On the other hand, RS surrogates avoid using the original model and replace it altogether with a faster-to-run alternative. In this branch of SMs, the original model is perceived as an input-output function and the metamodel is used to mimic the output surface as best as possible. Some of the algorithms for approximating response surfaces are polynomial interpolation, kriging, and more recently, machine learning (ML) algorithms. The following paragraphs summarize the advantages and disadvantages of LFPB and RS metamodels according to Razavi et al. (2012b).
Lower-fidelity Physically-based surrogates (LFPB), also known as multifidelity based surrogates or “coarse” models, include techniques such as network simplification (Dempsey et al., 1997; Paluszczyszyn et al., 2013; Ulanicki et al., 1996), and skeletonization (Shamir et al., 2008). Compared against RS metamodels, LFPB surrogates are expected to better emulate the unexplored regions of the explanatory variable (input) space (i.e., regions far from the previously evaluated points with the high-fidelity model) and, as such, perform more reliably in extrapolation. As for their drawbacks, LFPB models rely on the assumption that high-fidelity and low-fidelity models share the basic features and are correlated in some way. If this assumption is not satisfied, the surrogate modelling framework would not work, or provide minimal gains. Moreover, mapping the outputs from low resolution to the original resolution is not a trivial task, and may add complexity or uncertainty to the estimations.
Response surface (RS) surrogates, also known as statistical and black-box models, include techniques such as polynomials (Schultz et al., 2004), kriging (Baú & Mayer, 2006), and neural networks (Behzadian et al., 2009). Some of their advantages include the possibility of maintaining the fidelity of the original model, being model-independent (i.e., not requiring access to the components, such as code or equations of the original model), and easier implementation with respect to LFPB surrogates. Nonetheless, they can be hard to train for high-dimensional problems, which may require extreme computational costs to create large enough databases to train the metamodels. Moreover, RS metamodels require scrupulous validation to minimize the chance of over-fitting and maximize their ability to extrapolate.