1.1 Surrogate modelling
Water modellers have resorted to surrogate models (SMs) to replace
computationally costly models. Following the classification given by
Razavi et al. (2012b), SMs, also known as metamodels or reduced-order
models, can be categorized as Lower-fidelity Physically-based surrogates
(LFPB) or response surface (RS) surrogates. On one hand, LFPB metamodels
modify the original model to reduce its computational effort. These
models simplify the original model by lowering the resolution (e.g.,
larger time-steps) of the output or replacing computationally costly
components with faster alternatives or complements (e.g., kriging,
linear regression, neural networks (Fernandez et al., 2017)). On the
other hand, RS surrogates avoid using the original model and replace it
altogether with a faster-to-run alternative. In this branch of SMs, the
original model is perceived as an input-output function and the
metamodel is used to mimic the output surface as best as possible. Some
of the algorithms for approximating response surfaces are polynomial
interpolation, kriging, and more recently, machine learning (ML)
algorithms. The following paragraphs summarize the advantages and
disadvantages of LFPB and RS metamodels according to Razavi et al.
(2012b).
Lower-fidelity Physically-based surrogates (LFPB), also known as
multifidelity based surrogates or “coarse” models, include techniques
such as network simplification (Dempsey et al., 1997; Paluszczyszyn et
al., 2013; Ulanicki et al., 1996), and skeletonization (Shamir et al.,
2008). Compared against RS metamodels, LFPB surrogates are expected to
better emulate the unexplored regions of the explanatory variable
(input) space (i.e., regions far from the previously evaluated points
with the high-fidelity model) and, as such, perform more reliably in
extrapolation. As for their drawbacks, LFPB models rely on the
assumption that high-fidelity and low-fidelity models share the basic
features and are correlated in some way. If this assumption is not
satisfied, the surrogate modelling framework would not work, or provide
minimal gains. Moreover, mapping the outputs from low resolution to the
original resolution is not a trivial task, and may add complexity or
uncertainty to the estimations.
Response surface (RS) surrogates, also known as statistical and
black-box models, include techniques such as polynomials (Schultz et
al., 2004), kriging (Baú & Mayer, 2006), and neural networks (Behzadian
et al., 2009). Some of their advantages include the possibility of
maintaining the fidelity of the original model, being model-independent
(i.e., not requiring access to the components, such as code or equations
of the original model), and easier implementation with respect to LFPB
surrogates. Nonetheless, they can be hard to train for high-dimensional
problems, which may require extreme computational costs to create large
enough databases to train the metamodels. Moreover, RS metamodels
require scrupulous validation to minimize the chance of over-fitting and
maximize their ability to extrapolate.