Figure 2. Case study type distribution for Water Distribution
Systems (WDS) and Urban Drainage Systems (UDS)
On UDSs, in terms of size, most of the papers do not report the number
of pipes. Consequently, the extent of the system was often assessed by
the reported area. This suggests that when MLSMs are used, the water
network is set aside and only the relation input-output is considered.
The extent of the case study (number of pipes or area) is a proxy of the
complexity of the case studies which is the relevant dimension.
Nevertheless, some applications can involve medium-sized networks but
with high complexity (e.g., different control elements, multiple
objectives, changing scenarios, among others). Besides the particular
characteristics of each network and application, the metamodeling
process was the same regardless of the size of the network. However, the
required time for creating the database and training the model increases
with the complexity of the case study. So far, the procedure does not
vary as a function of the complexity of the case study; nonetheless,
considering modifications to the training process or the metamodels
based on the complexity of the case study could yield better
approximations to the RSs.
Since each system has a different area and number of pipes, we proposed
the categorization in Table 1. The ratio between the number of small
networks and the rest is noticeably bigger in WDSs than in UDSs due to
the use of benchmarks to test the methodologies. Even though the use of
metamodels is justified in larger networks, its use decreases as the
size increases.
3.3. Metamodelling Methods
Regardless of the water network type and metamodel applications, the
preferred method for metamodeling is the ANN. ANNs are computational
models based on the complex interaction of multiple individual
components (i.e., units or neurons). Each unit performs the same
procedure: receiving information, executing an operation (usually a
linear transformation of the inputs), applying a non-linear
transformation to the result (e.g., hyperbolic tangent, sigmoid,
rectified linear unit), and sending the information to the next
connected units. Each of the units has trainable parameters that
determine the relative weight of each of the inputs. Units are arranged
in layers; each ANN has at least one input layer and one output layer,
where the inputs are presented to the network and the computed outputs
are collected, respectively. Between these layers, there are one or more
hidden layers, where most of the information processing takes place.
ANNs learn to approximate the input-output relationships in the data by
tuning the trainable parameters (i.e., unit’s weights and biases) during
the backpropagation learning process, which is usually carried via
gradient descent and by computing the partial derivatives of the hidden
layers using the chain rule of derivation. For a complete review of
ANNs, the reader is redirected to Goodfellow et al. (2016) for a general
resource and Shen (2018) for a specific review for water resources
scientists.
The analysis of the literature shows that the MultiLayer Perceptron
(MLP) is the most widely used MLSM. The MLP is a specific ANN
architecture that consists of a series of layers in which all the units
of a layer are connected to all the neurons in the previous and next
layer; hence it is also known as the fully connected ANN. Most of the
reviewed studies in this paper used this architecture with one hidden
layer; mainly due to its simplicity, high speed, and accuracy. Still,
the ANNs can be customized to increase the accuracy of certain
applications. This practice of creating deep networks, i.e., with more
layers and units per layer, is part of modern deep learning (Goodfellow
et al., 2016).
In WDSs, there are two cases of variations on the number of layers:
Sayers et al. (2019) used two hidden layers for optimisation of design
while Yoon et al. (2020) used 15 layers in their ANN to estimate the
network performance after earthquake events. Deep networks may increase
performance but they are more prone to overfitting, and require more
training time and examples. Also, it is not possible to know the number
of layers and units that yield the best performance. For example,
Modesto De Souza et al., (2021) tested multiple architectures of an MLP
for pressure estimation in a WDS. Their results suggest that the optimal
number of layers is two but this can vary for other applications. On the
other hand, UDSs present more variation on the implemented MLPs
including varying the number of hidden layers (Berkhahn et al., 2019;
Kim & Han, 2020; Raei et al., 2019), changing the activation function
to a radial basis function (She & You, 2019; Vojinovic et al., 2003),
and adding fuzzy logic (Keum et al., 2020).
As previously stated, MLPs are the most popular MLSM. This is not
surprising due to its ease of implementation and success in multiple
applications, as well as hype from the AI community. However, the MLP,
and in general, the ML methods present several drawbacks. As Razavi et
al. (2012a) indicated in their numerical assessment of metamodelling
strategies in computationally intensive optimisation, “the likelihood
that a metamodel-enabled optimizer outperforms an optimizer without
metamodelling is higher when a very limited computational budget is
available; however, this is not the case when the metamodel is a neural
network. In other words, neural networks are severely handicapped in
limited computational budgets, as their effective training typically
requires a relatively large set of design sites, and thus are not
recommended for use in these situations.”. Therefore, the use of an ANN
may even harm the development of an application. In that same work, the
authors show that there are cases for which it is better to not use a
metamodel and go with the original model instead. Consequently, they
recommend further research on determining where it is worth pursuing a
metamodeling approach. In recent years, the widespread availability of
parallel computing (e.g., cloud computing and graphics processing unit)
and user-friendly Deep Learning libraries, such as Pytorch (Paszke et
al., 2019), have largely reduced this problem.
Even though using MLPs is the most popular choice from the set of ML
tools, it is not the only one. For example, Pasha & Lansey, (2014) used
support vector machines (SVMs) for improving the real-time estimation of
water tank levels and thus decreasing pump energy consumption in a WDS.
In UDSs, Chiang et al. (2010) implemented an early form of recurrent
neural network (RNN) for water level predictions at gauged and ungauged
sites. According to the authors, their decision of using this
architecture was motivated by its increase in performance. However, the
main disadvantages of this architecture lies in training difficulty
(Pascanu et al., 2013) and computational costs (Strubell et al., 2020).
Similarly, Kim et al. (2019) and She & You (2019) leveraged the time
structure in rainfall time series for real-time flood prediction with a
nonlinear autoregressive network with exogenous inputs (NARX) neural
networks. This architecture is a feedforward ANN that calculates the
next value of a time series as a function of both past input and output
values. In each study, the authors tailored the model to the conditions
of their problem. Kim et al. (2019) added a second verification step to
account for values that incur serious inundation damage and She & You
(2019) implemented a NARX neural network for the monotonic parts of a
hydrograph (i.e., ascending and descending stages) and a radial basis
function MLP for the non-monotonic interval (i.e., around the peak).
3.3.1 Metamodel inputs and outputs
The inputs to the metamodels in UWN applications are usually decision
and explanatory variables while the outputs can vary based on the scope
of the problem. Based on the inputs used in the reviewed papers, there
is not a single consistent variable across the different applications in
any of the water networks; they are problem-specific. For example, flood
prediction in UDSs relies on rainfall time series, while the design of
WDSs relies on inputs such as pipe diameters and chlorine rating doses.
On the other hand, the output of the metamodels are usually state
variables of the UWN or performance metrics. For example, a metamodel
can be developed to estimate a pressure-dependent metric, such as the
resilience Network Resilience Index (NRI) (Prasad & Park, 2004), or it
can output the pressures in a WDS, used to compute the NRI. Other
examples of surrogated components are water level in storage units or
pump energy consumption. Other examples of overall metrics are sampling
accuracy (Behzadian et al., 2009), the economic cost of interventions,
greenhouse gases, reliability, and vulnerability (Beh et al., 2017).
Determining the output and scope of the metamodel entails deciding if
the metamodel should emulate the model or one of the objectives computed
after the hydraulic simulation. The reader is referred to Broad et al.
(2015) for a complete methodology about metamodel scope for risk-based
optimisation and its application to WDS design. In contrast, there are
no applications for objective approximation using MLSMs in UDS.
By inspecting the dimensions (i.e., number) of the inputs and outputs, a
converging trend is visible: the number of inputs is higher than the
number of outputs. This is no surprise since most of the studies
estimate one or two target values that summarize the desired state of
the network (e.g., overall performance, minimum chlorine concentration,
total flooding volume) with multiple decision and state variables.
Nevertheless, some authors have used fewer variables to produce more
outputs. For example, in WDSs, Lima et al. (2018) and Meirelles et al.
(2017) estimated 118 pressure nodes with only known pressure at 3 nodes,
while Kim et al. (2019) predicted urban floods in multiple nodes with a
single rainfall time series.
On the dimensionality of ANNs, having multiple inputs and outputs allows
accounting for more complexity in the applications; nonetheless, they
both come with downsides. For the input dimensions, Razavi et al.
(2012b) argue against using a large number of explanatory variables
(>20) since the minimum number of training examples can be
excessively large. On the other side of the model, the number of output
variables also is recommended to be low. In theory, the number of output
variables is not restricted; moreover, it is one advantage of ANNs over
other RS metamodels as they can act as multi-output emulators. However,
an ANN with multiple outputs will seek to find a compromise between the
errors of all the outputs, which might hurt the overall accuracy of the
MLSM. For this reason, an alternative approach is to train an ANN for
each output variable. Since each objective has a metamodel, the accuracy
increases but also does the training time. As noted by Andrade et al.
(2016), considering one multi-output ANN or multiple ANNs with single
output depends on the problem at hand. The size of the water network is
the most important factor since, for small systems, the results with one
or multiple ANNs are equivalent in performance. In addition, the choice
of one model or the other should consider desired accuracy, available
metamodeling time, and required speed of execution.
3.3.2 Metamodel Performance
Regarding the performance of a metamodel, the most important
characteristics are computational speed and prediction accuracy. The
computational saving is reported as a reduction of the time that the
application would have taken by running the original model. This
quantity was reported by nearly half of the reviewed studies and it was
on average higher than 90%, most of the time over 98%. This is a
satisfactory indication since the purpose of these SMs is to reduce the
computational burden of intensive applications. Nonetheless, around half
of the studies did not report this saving. Although quantifying the
computational saving is not always easy, it is recommended for future
researchers who use a metamodel to consider such an estimate. Since the
design and training time could be longer than the expected saved time,
having an estimate of the potential saving aids in the decision of
making a metamodel.
In terms of prediction accuracy, there are multiple indicators used by
the researchers to assess the fidelity of the ML algorithm to the
original model. These common metrics include root mean squared error
(RMSE), Nash-Sutcliffe efficiency coefficient (NSE), mean absolute error
(MAE), and Pearson correlation coefficient. This multitude of metrics
hinders a straight comparison between models or applications, but
overall it is possible to observe good fittings between the metamodel
and the original model. It is worth noticing that the metamodel will
reflect reality as much as the original model is capable of doing so.
Metamodels are second-level abstractions and therefore may only be as
good as the original model in terms of accuracy.
In addition to the previously mentioned criteria, Razavi et al. (2012b)
include development time, and Asher et al. (2015) add
surrogate-introduced uncertainty as assessment metrics. For these
criteria, seven of the reviewed papers calculated or referred to the
time it took to train the models and only five performed an analysis on
the metamodels’ robustness. Given the versatility and multipurpose
nature of the SMs, there are other performance indicators, e.g., ease of
development, explainability, generalization, or re-trainability. Along
these lines, the reviewed papers disregard these indicators since the
development of the metamodel is specific for each case study and the
implementation goes unnoticed. These indicators are secondary in
comparison to computational saving and accuracy. Both metrics constitute
the most relevant metrics used in the literature, including this review.
4 Current issues in metamodelling
Based on the current status presented in the previous section the
following issues were identified.
4.1. Basic applications
MLSMs have been used to tackle various issues, namely, optimisation,
uncertainty analyses, real-time applications, state forecast, and aiding
LFPB metamodels. Although these generally addressed relevant problems,
each of the reviewed papers had a basic framing, i.e., the inputs deal
with few design or input variables (e.g., diameters, chlorine dosage,
accumulated rainfall) and the outputs are usually summary variables
(e.g., critical pressure, chlorine residual, flood volume). This
approach is comprehensible for several reasons. First, most of the time
the simplifications still retain sufficient problem information to find
an adequate solution. Second, it avoids problems related to high
dimensionality in the inputs and outputs. Lastly, it allows researchers
to introduce their metamodeling method without interference from
excessive complexity.
Although these frames are effective, they could result simplistic for
the complexity of water networks. Considering a small set of
interventions may discard types and combinations of interventions (e.g.,
allowing not only for change in diameters but also adding pumps or doing
both at the same time). Furthermore, other changes in the network or
their components, or even interactions with other city systems could be
explored. However, these are rarely considered since they represent a
challenge for traditional RS metamodels; current MLSMs are very specific
to the cases in which they are trained on. Because of this, new
approaches are required, mainly in optimisation and uncertainty
analysis.
As seen in section 3, the most popular application for MLSMs is
optimisation. In this application, multiple authors (Beh et al., 2017;
Doorn, 2021; Kapelan et al., 2005; Razavi et al., 2021) have remarked on
the importance of considering new objectives. For example, robustness
for designing water systems, especially under deep uncertainty, requires
considering multiple scenarios for which is not possible to assign a
probability or ranking. This analysis is desirable because water
networks are systems with long lifespans of service. Nonetheless,
objectives like robustness tend to be more computationally intensive;
therefore, their need for metamodels increases.
A relevant missing layer of complexity is uncertainty analysis,
especially for UDSs. The current practice to design the system is to use
a single benchmark storm and assume it is representative of the future
rain events the system will face. However, two UDSs with similar
performance during a design event could behave very differently for
other rainfall patterns. According to Ng et al. (2020), the final design
considering a single strong storm does not guarantee optimal performance
during long mild storms and for a succession of frequent small events.
Naturally, the authors recognize that performing a design considering
multiple events would increase the computational effort but also suggest
the implementation of SMs for dealing with this difficulty.
4.2 Case studies: Lack of benchmarking with complex networks
Benchmark water networks are open access datasets that contain the
necessary information to create models of a system. It consists of the
topology of the network, its components, and depending on the system it
could incorporate leakages, demand patterns, cyber-attacks, rainfall, or
surveillance data. Benchmarks are used as reference points to compare
the performance of models and algorithms. Here, it is necessary to
distinguish between synthetic and real data. Even though the synthetic
data allow to implement and compare algorithms, they may not reflect all
the processes that real data can account for.
There is a clear difference between types of infrastructure in the
number of used networks since benchmark networks in UDSs are not as
available as in WDSs. In water distribution, there is a set of water
networks called Water Distribution System Research database. The ASCE
Task Committee on Research Databases for WDS created this database which
is hosted by the University of Kentucky (2013). There are benchmarks for
multiple problems in categories such as network expansion, operation,
and design. This allows modellers to easily obtain data for the
development and comparison of algorithms in networks of different sizes.
On the other hand, there is no consolidated set of benchmark networks
for UDSs, let alone an entire structured database. This is attributable
to factors such as the difficulty of taking measurements in sewer
environments and, according to Pedersen et al. (2021), the little
interest of utility companies in making the datasets publicly available.
Consequently, all the applications on UDSs were entirely developed for
real cases, which is positive for the bridging between the theoretical
approaches and the practice, but hampers the development of algorithms
on the systems, due to the difficulty of comparison and the process of
accounting for particularities of each system.
Regarding the size of the case studies, most of the systems in which the
MLSMs were used were medium or small. Metamodels are most useful in
problems with large computational times, that is, in applications with
large water networks. In the case of WDSs, a common practice to test the
effectiveness of a method is developing a metamodel for a small
benchmark network and then using the same steps for creating a metamodel
in a big real case. Even though this practice is reasonable, it assumes
the response surface of both networks is comparable or similar. However,
this is not necessarily the case as reported by Andrade et al. (2016)
who noted contrasting accuracies between big and small case studies when
training metamodels. Exploring solution spaces is already an issue when
using metamodels, independent of the network, as reported by Broad et
al. (2005), but large networks represent additional challenges that
increase in complexity in a non-linear manner.
4.3 Machine learning and multi-layer perceptron limitations
Although the MLP is not the only ML technique, it is the most popular
one among MLSMs. Given that its structure allows it to address multiple
types of problems, it has become a one-size-fits-all model.
Nevertheless, it presents multiple issues, namely, the curse of
dimensionality, black-box nature, and rigid structure. These three
shortcoming respectively 1) hinder their use for high dimensionality
problems, 2) limit confidence in their approximations, and 3) prevent
the transferability of trained models across different case studies.
4.3.1 Curse of dimensionality - Metamodeling time
The curse of dimensionality indicates that for a certain level of
accuracy, there is an exponential increase in the required amount of
data as the dimensions of a problem increase (Keogh & Mueen, 2017).
Naturally, this problem can be addressed by reducing the number of input
dimensions (i.e., fewer explanatory variables) using prioritization
based on experience, knowledge of the task, or some automatic procedure
such as principal component analysis (PCA). However, as noted by Maier
et al. (2014), for real-world problems reducing the number of input
features may not be a satisfactory solution because it usually leads to
an approximation that could exclude optimal zones and prevent the
algorithms to find optimal solutions. Given this situation, searching
for solutions on the algorithmic side may yield better answers.
The SMs have worked adequately so far but future metamodels are likely
to increase in complexity. This is either due to an increase in the
complexity of UWNs or an increase in the number of input (more design
choices/explanatory variables) or output (more objectives) dimensions.
Both drivers increase the size of the metamodels and consequently the
number of training examples. Since the original models are already
expensive to run, creating a large training dataset might be unfeasible
in the first place. The metamodeling time would become the obstacle.
This time is usually disregarded since some authors consider it not
relevant compared to the posterior computational gain in the
application. Nevertheless, this time is important in high dimensional
search spaces, as noted by Razavi et al. (2012b), since the number of
design samples required to train the metamodel could be already
prohibitively large.
4.3.2 Black box nature - Deterministic and obscure outputs
Two of the most recurrent criticisms of ML models are their lack of
uncertainty estimation and the lack of their transparency, i.e. little
or no ability to explain the results they obtain. Both are overlooked
aspects of metamodeling in the context of UWNs. The MLSMs return a
unique answer without uncertainty bands or possibilities to explain the
combination of inputs that drove to the final outputs. For SMs, these
issues are not major concerns; nevertheless, their inclusion aids the
applications in which the SMs are used.
Regarding uncertainty estimation, a few papers (Raei et al., 2019; Rosin
et al., 2021; She & You, 2019; W. Zhang et al., 2019) estimated the
effect of including a metamodel in their respective application. Not
accounting for this uncertainty can lead to bad approximations of the
actual response surface and suboptimal or unfeasible solutions. Authors
have dealt with this difficulty by performing sensitivity analysis
(e.g., Raei et al., 2019) or training multiple models in parallel with
slightly different datasets and averaging the outputs of the models. For
example, Rosin et al. (2021) developed a committee of ANNs with this
approach. However, this analysis requires extra considerations which may
increase the metamodeling time. Some guidelines have been given for the
pre-treatment (Broad et al., 2015) and post-treatment (Broad et al.,
2005a) of these SMs but there is still a lack of focus on improving the
management of uncertainty during treatment, i.e., developing a model
that directly considers uncertainty. Algorithms in the branch of robust
ML may contribute to aid in the direct incorporation of metamodel
uncertainty quantification whether it comes from the data (Wong &
Kolter, 2019) or the model (Loquercio et al., 2020) .
Although robust learning allows estimating the uncertainty of a result,
it cannot explain why. This is the area of explainable ML. For water
networks’ SMs, being able to explain the results would help to
understand the relationship between the decision variables and the
objective function for the particular network that is being surrogated.
For example, understanding which pipes (or a combination of them) play a
key role in the resilience or flooding in a water network. There is a
growing interest in the AI community towards explainable models to gain
insights (Bhatt et al., 2020), ensure scientific value (Roscher et al.,
2020), and develop trust in the outcomes of ML models (Dosilovic et al.,
2018).
4.3.3. Rigid architecture - Specific case use
One disadvantage of MLSMs is the high degree of specialization in the
trained metamodel. As seen before, these metamodels achieve high
accuracies in the data for which they were trained. However, once they
are trained, they become specific and rigid. Their structure limits its
use for other tasks in the same system or similar applications in other
water networks. The metamodel can be run several times on the same water
network but doing the same operation in a different system requires a
new metamodel, which should be trained from scratch. This is not
desirable since the training process could consume most of the
computational budget, especially in large case studies.
One solution is to leverage the training process of other models with
transfer learning to decrease the number of examples to train a new
model. Situations for which transfer learning is desirable are changes
in the water network composition, similar system metamodeling, and
change in the behaviour of the surrogated system. Changing components of
the system accounts for scenarios when components (e.g., pipes, pumps,
or tanks) are added to or removed from the system. Even though the
system changes, it is still related enough to leverage a pre-trained
model on that water network. In a similar way, two networks can share
enough resemblance (e.g., a subsystem of another network, two
skeletonized networks, or two networks with similar topology or
geography) that it makes sense to use an SM from one as a pre-trained SM
for the other. Lastly, when the system changes and the metamodel no
longer applies is a challenge, also known as concept drift, that can be
addressed using transfer learning. Here the two related water networks
are the same but in two different periods.
4.4. Gaps in Knowledge
Based on the above critical analyses of metamodels and the issues
identified the following key gaps in knowledge are summarised here:
1. Lack of depth on optimisation of complex objectives and uncertainty
analysis for water networks using MLSMs. There are still additional and
more complex objectives that can be optimised with the aid of MLSMs, for
instance, robustness and interventions under deep uncertainty.
2. Lack of benchmark water networks, especially for UDSs and complex
cases. First, this hinders the development and comparison of algorithms
across studies, and second, these metamodels still lack research on the
changes of the response surface with the increase in the complexity of
the water system, especially for large systems
3. Current MLSMs’ limitations prevent advanced metamodeling
applications. MLSMs can easily grow in size when the complexity of the
response surface increases, most of the applications do not consider the
uncertainty added by the metamodel, and its structure makes it rigid and
not (re)usable for other cases.
5 Research directions
Based on the identified gaps, three main lines for future research are
suggested. They consider the current and future needs in applications on
UWNs as well as the potential of MLSMs to meet them.
5.1 Advanced applications
The current needs for adaptable water infrastructure are based on
drivers such as growing demographics, urbanization, and climate change.
As indicated in the UN-Water report “Water and Climate Change”, taking
adaptation and mitigation measures benefits water resources management
and improves the provision of water supply and sanitation services. In
addition, it contributes to combat both causes and impacts of climate
change while contributing to meeting several of the Sustainable
Development Goals (UNESCO, 2020). In UWNs, multi-objective optimisation
and uncertainty analysis play a key role in the search for adaptation
measures and decision making, and MLSMs can help improve and accelerate
their implementation.
Optimisation applications will increase in the number and complexity of
the inputs and outputs. Increasing the number of inputs, i.e., decision
variables and design interventions (e.g., nature-based solutions),
allows to explore more alternatives, consider uncertainty, or assess
multiple scenarios. On the other hand, the output of the optimisation is
leaning towards complex objectives such as multi-objective robustness
(e.g., Kasprzyk et al., 2013), multiple technical performance metrics
(e.g., Fu et al., 2013), pro-active maintenance (Kumar et al., 2018),
complex water quality indicators (Jia et al., 2021), and human values
(Doorn, 2021). Multi-objective optimisation allows identifying solutions
balancing trade-offs among objectives, for instance, cost and resilience
(Wang et al., 2015). Naturally, when considering more objectives, the
computational load increases, especially when those objectives are
computationally expensive (e.g., robustness). In previous phases of
research on optimisation, metamodels were seen as an aid, but as
optimisation gradually evolves to consider additional and more complex
objectives, metamodels become indispensable (e.g., Beh et al., 2017).
Regarding uncertainty analysis, it is necessary to have fast, reliable,
and flexible metamodels that can adapt to the multiple conditions in
which the systems are evaluated and under multiple criteria.
Traditionally, simplified models have been preferred for this task;
however, RS metamodels become appealing alternatives when dealing with
more complex objective functions and original models. Metamodels should
play a key role in the development of frameworks for robustness-driven
design. This application has major implications for UDSs, since no MLSM
study focused on uncertainty analysis, even when the evidence suggests
the criteria for the design of these systems is not necessarily robust
(Ng et al., 2020). Although uncertainty analysis entails an intrinsic
increase in the computational effort, the benefits they bring outweigh
the challenges it represents. According to the IPCC (2021b), UDSs are
expected to receive more intense rainfall events based on climatic
projections but considerable uncertainty remains.
The community should further research combined RS-LPFB applications, to
further integrate MLSMs with physically-based models for accelerating
the underlying hydrodynamic engines. Likewise, physically-based models
could be hybridized by incorporating an ML model that corrects the
outputs of the original model for higher accuracy accounting for the
real behaviour of the system. Looking ahead, ML algorithms could detach
from the physically-based model and replace its functioning with a
cheaper version to run based on increasingly available real-world data
(e.g., digital twins for UWNs (IWA, 2021)).
5.2 Benchmarking and large network behaviour
The lack of benchmark models is a gap that was already identified by
Maier et al. (2014) who set the characteristics and recommendations of
valuable benchmarks, including non-trivial real-world problems with a
representative range of decision problems characteristic of the water
systems. The review shows that UDSs lack such benchmarks. To overcome
this issue, we recommended to implement a similar approach to that of
the Kentucky database, with applications such as real-time control,
outflow, and flood prediction. For WDSs, it is appropriate to enlarge
the current databases to account for new objectives, interventions,
performance metrics, and real case examples. Regarding metamodels, the
benchmarks should also include a reference model to compare
computational saving and accuracy, with suggested performance metrics,
such as NSE, RMSE, or the number of model executions.
As Goodfellow et al. (2016) indicate, having benchmark databases with
real cases is one of the reasons why deep learning has recently become a
crucial technology in several disciplines. In AI, datasets went from
hundreds or thousands of examples in the early 1980s up to datasets with
millions of examples after 2010. Nowadays, thanks to the increase in
connectivity and digitalization of our society, a large amount of ML
algorithms can be fed with the information they require to achieve high
accuracy. Since the ML and DL models are dependent on their training
sets, their success goes hand in hand with the size and quality of
available datasets, preferable with real information. The UWNs’ research
community is moving the first steps in this direction. One example
concerns the UDS of the Bellinge dataset (Pedersen et al., 2021), a
suburb to the city of Odense, Denmark that is now available for
“independent testing and replication of results from future scientific
developments and innovation within urban hydrology and urban drainage
system research”. This dataset includes 10 years of asset data
(information from manholes and links), sensor data (level, flow, and
power meters), rain data, hydrodynamic models (MIKE urban and EPA SWMM),
and other information. Similar examples are needed to enable the
exploration of metamodels’ responses in networks of different
characteristics (e.g., size, connectivity, slope).
As for the size of the networks, further research is required to assess
the response surface of large networks. Specifically, new benchmark
datasets should also include complex network cases for their study.
These can be large networks or medium-size cases with high complexity.
Considering that the larger the network the higher the required time to
generate and use the training data, significant efforts are required on
this matter. Metamodels could aid in reducing the computational times
that obstruct studying the response surface of large and complex
systems. Nonetheless, new metamodels are required to account for the
complexity of these cases and use as few training scenarios as possible.
5.3 Unexplored advanced metamodeling technologies
ML is the area with the highest growth in academic output in recent
years. However, the field of MLSMs for UWNs has not yet considered the
new tools and algorithms recently developed by researchers in
fundamental AI or other applied disciplines. These advancements include
DL architectures that express assumptions of the data in the ANNs for
robust, interpretable, and transferrable models. This new wave of AI
formalizes the attempts to add knowledge about modelled processes as
well as extract knowledge from the results.
5.3.1 Inductive bias – Deep learning: Graph Neural Networks
The curse of dimensionality can be addressed by including inductive
biases. Following the work of Battaglia et al. (2018), we define the
inductive bias as the “expression of assumptions about either the
data-generating process or the space of solutions”. Inductive bias can
be seen as well in the architecture of the model by leveraging the inner
structure of the data, which could be spatial, temporal, or relational.
Exploiting the structural information of the data can reduce the number
of parameters, and consequently the required training examples by
parameter sharing and sparsity of connections. The data structure gives
information about the similarity of the data points in a relevant
dimension (e.g., distance, time, connection). In that sense, similar
data can be treated analogously (parameter sharing) and dissimilar data
can remain unrelated (sparse connectivity).
Inductive bias nudges a learning algorithm to prioritize some solutions
over others. This allows finding high-performing solutions more easily
than when it is not considered. Ideally, involving inductive bias
improves the search for solutions without compromising the performance,
as long as the right inductive bias is chosen; otherwise, it can lead to
suboptimal performance (Battaglia et al., 2018). For example, when
surrogating the pressure at the nodes of a WDS with a neural network
(e.g., Broad et al., 2005; Meirelles et al., 2017) there are multiple
metamodel solutions, i.e., architectures with specific parameter values
that can approximate the response surface described by the training
data. Nevertheless, when adding inductive bias, the set of possible
solutions shrinks to a subset of solutions that comply with predefined
characteristics, for example, having graph structure, following physical
laws, or agreeing with measurements.
The most common components in DL are fully connected, convolutional,
recurrent, and, more recently, graph layers. The fully connected layers
have a weak inductive bias, while each of the remaining exploits some
relation or invariance in the data. The convolutional layers typical of
convolutional neural networks (CNNs) leverage the regular structures in
grids, such as images, and connects information according to Euclidean
closeness. Recurrent neural networks (RNNs) consist of recurrent units
which consecutively process data sequences, such as time series, and
connects information according to sequential similarity. On the other
hand, graph neural networks (GNNs) extend DL methods to non-Euclidean
data, such as graphs, where entities are connected by relations or, in
graph terminology, nodes connected by edges.
Given their relational inductive bias, GNNs are the most suitable DL
architecture for applications in UWNs, since the natural structure of
these systems is a graph. Researchers have already exploited graph
theoretical concepts to develop decomposition models of WDNs (Deuerlein,
2008), assess the resilience of sectorized WDNs (Herrera et al., 2016),
as well as identifying critical elements in UWNs (Meijer et al., 2018,
2020). Furthermore, there are already some applications of GNNs in UWNs.
In WDSs, Tsiami & Makropoulos, (2021) employed this architecture for
cyber-physical attack detection using a graph created from sensors in
the water system. In UDSs, Belghaddar et al. (2021) applied this method
to database completion of wastewater networks.
This architecture operates on the graph domain, which allows it to
leverage the pre-existing network topology of the data. This
architecture has gained considerable attention in the last years due to
its ability to include relational structure from connected entities.
Even though GNNs’ outputs continue to be hardly explainable, there are
efforts to generate explanations of their outputs, e.g., GNNExplainer
(Ying et al., 2019). As noted by Battaglia et al., (2018), “the
entities and relations that GNNs operate over often correspond to things
that humans understand (such as physical objects), thus supporting more
interpretable analysis and visualization”. In this way, GNNs are not
entirely explainable but they are more explainable than other DL
architectures.
It is also possible to use combinations of layers in problems that
contain more than one structure such as in the case of UWNs, which have
temporal, spatial, and topological variability. An example of the
application of these graph models in a civil infrastructure was
developed by Sun et al. (2020) who included the spatial and temporal
relations in a road network for traffic forecasting. This infrastructure
has multiple parallels with UWNs, including its graph connectivity,
spatial-temporal variability, and human interaction. Another similar
infrastructure with more examples can be found in power systems for
which GNNs have been used in key applications such as fault scenario
application, time series prediction, power flow calculation, and data
generation (Liao et al., 2021). For a review in depth of GNN
architecture, the reader is referred to Zhou et al. (2018).
This architecture presents an opportunity to leverage the present
structure of the data generated in the UWNs to decrease the number of
parameters and consequently the required training data; which enables
creating SMs of larger networks and many and more complex objectives. By
conditioning the characteristics of the solutions, the metamodels gain
the possibility to generalize to similar cases. For example, pipe
changes in a network configuration could be better represented with a
GNN-based metamodel. This GNN SM could be able to adjust itself without
modifying the underlying structure, which would probably be required in
the case of other metamodels that do not consider this inductive bias.
5.3.2 Third wave of Artificial Intelligence
The US Defense Advanced Research Projects Agency (DARPA, 2016) separates
the different phases of AI into three waves. The first wave refers to
the past approaches and the birth of AI, the second wave is the current
and popular phase of high-performing black boxes, and lastly, the third
wave is proposed for the future of AI with models leaning towards
robustness and explainability.
Robustness refers to the ability to include uncertainty in the
calculation of the outputs of a model, in this way the user not only
receives a deterministic answer but a range of possible values, usually
represented by an expected value (e.g., mean) and a measure of
uncertainty (e.g., variance). According to Gawlikowski et al. (2021),
methods for estimating uncertainty in ANNs can be split into four types:
single deterministic methods, bayesian methods, ensemble methods, and
test-time augmentation methods. Each of these lines offers an estimation
of the degree to which the neural network is certain of the output. This
aspect is relevant when quantifying how likely it is for the metamodel
to detach from the response surface which may cause, depending on the
application, to omit optimal solutions, miss outflows, or underestimate
floods. Recommended methods for implementation on MLSMs include Bayesian
neural networks (e.g., Zhu & Zabaras, 2018) or single deterministic
methods, the latter is recommended based on the low additional
computational burden they include.
Research in explainability has also gained popularity in recent years.
In the case of MLSMs, having an explainable model would allow us to
better understand the response surface of the original model or the
solution space. An improved comprehension of the response surface would
facilitate obtaining a better insight on the behaviour of different
algorithms (e.g., evolutionary methods); ultimately, contributing to
what type of heuristic is best suitable in each application in water
network which is a topic in which we have still very little
understanding of (Maier et al., 2014). On the other hand, solution space
explanation would allow gaining insight about which and components in
the real system affect its performance, but most importantly, how they
affect it. This could drive the interventions in the physical water
network to improve its performance. Recommended models for
implementation in this category are GNNs, as already reported by Tsiami
& Makropoulos (2021), who were able to perform a removal analysis to
quantify the contribution of each considered component (e.g., valves,
tanks, and pumps) of the physical water network to the model’s
performance. Since GNNs’ structure resemble the underlying system, it is
possible to relate events on the metamodel to the actual system.
5.3.3 Transferrable AI models
The reviewed studies in this paper presented a methodology for training
a metamodel to surrogate a computationally expensive model. Although the
methodology is transferrable, meaning the steps can be followed and
repeated to obtain a similar metamodel in another case study, the
metamodel itself cannot be transferred to a new case study. This implies
that all the metamodeling time spent on training is specific for every
case. Through transferrable models, the authors may develop not only
methodologies but also pre-trained SMs, which can be adapted to other
cases lowering the amount of training needed for this new network.
Having a transferrable model would allow training the metamodel with
data not only from the case study at hand but also from other, real and
synthetic cases. For example, the benchmark datasets discussed
previously. This increase in available information to train on is
expected to improve the performance of the metamodel or even allow it to
exist for cases in which data is scarce, for example, very
computationally expensive UWNs in which training examples are costly.
Once again, inductive bias plays a role, since the assumptions added to
the algorithm delimit a smaller solution space, the ML models can be
used as pre-trained solutions for other tasks. In the AI domain, this
practice is referred to as transfer learning. Transfer learning is
mainly implemented for specialized deep learning methods, i.e.,
architectures with strong inductive bias. It has been successfully
implemented for applications such as diagnosis of medical images using
CNNs (Vogado et al., 2018), prediction of air pollutants using RNNs
(Hang et al., 2020), and bioinformatics as well as social-network
classification tasks with GNNs (Verma & Zhang, 2019), among others
(Weiss et al., 2016).
For transferrable SMs in UWNs, GNNs seem to be the natural option based
on the agreement between the structure of the real system and the
inductive bias corresponding to the GNNs. In an analogous way that CNNs
learn filters that are independent of the input (i.e., images), GNNs
learn filters that can be used across cases (e.g., water networks).
Adding the structure and physics to the metamodel allows including more
domain knowledge in the ANN that improves generalization capabilities. A
relevant example of a model like this is the mass conserving RNN for
rainfall-runoff modelling developed by Hoedt et al. (2021) in which the
parameters used in the model resemble the mass conservation principle,
which increased the accuracy and improved the model’s interpretability.
At the same time, transferability opens the door to new applications,
such as online optimisation of interventions, by learning the effect of
changes in the topology and components of the network.
Using physical information, such as the knowledge embedded in the
hydrodynamic models, also allows generating hybrid and general models.
These models allow bridging the best of two domains: physical-based and
data-driven. On this, Vojinovic et al. (2003) indicated that “the major
advantage of integrating both a deterministic (numerical) model and a
stochastic (data-driven) model over using the stochastic data-driven
model alone is that the already available deterministic model quality is
exploited and improved, instead of starting from scratch and throwing
away all knowledge.” Furthermore, combining the domain knowledge with
transferable models opens the possibility of creating general models.
This type of model detaches from the training set in which it was
trained so that its predictions can be applied in unseen scenarios.
Following this trend, Kratzert et al. (2019) developed a recurrent ANN
trained on basins from a continental dataset using meteorological time
series data and static catchment attributes, and they were able to
outperform hydrological benchmark models calibrated on individual
catchments. The analogous application in UWNs would be an ML-based
hydrodynamic model trained on a set of distribution or drainage systems
which can generalize to independent unknown water networks. Such
“DeEPANET” or “DeepSWMM” models can be developed by leveraging the
inductive bias of GNNs, and accounting for the time dimension with
recurrent layers or by resorting to an encoder-decoder architecture (Du
et al., 2020).
6 Conclusions
This work reviews the current state of the application of MLSMs in urban
water networks and proposes promising forward directions based on recent
and successful developments in ML.
In terms of purpose, the main uses of MLSM in UWNs are optimisation and
real-time problems. Even though MLSM accelerate optimisation algorithms
by increasing the speed of individual iterations, these algorithms have
multiple disadvantages. The training process can be time-consuming and
the required size of that dataset cannot be known a priori as it depends
on the complexity of the input-output mapping. For case study type, the
UWNs in which MLSMs are applied vary in size and type. For analysing the
complexity of the case studies, we preffered to consider WDSs and UDSs
separately. Regarding its use in WDSs, the papers follow a clear
pattern: the development and trial are usually made in medium or small
benchmark networks, and the posterior implementation of the metamodel is
done in a large real network. On the other hand, UDSs do not count with
applications on benchmark networks due to their lack of availability. In
terms of the metamodel, except for some applications of SVMs or RNNs,
the vast majority of applications used MLP as SM. This method has been
successfully implemented due to its high accuracy and flexibility
regarding the inputs and outputs that it can map. Nevertheless, the
MLSMs present multiple drawbacks that may even harm the development of
an application. It is advisable to consider if an MLSM is worthwhile
before starting its training.
Based on the reviewed literature, the following issues and gaps in
knowledge were identified in terms of limitations of existing MLSMs.
These problems include limitations on the MLSMs, lack of depth in
current applications, and insufficient benchmarking datasets.
- Regarding metamodels’ limitations, current MLSMs have the following
issues: they can easily grow in size when the complexity of the
response surface increases, most of the applications do not consider
the uncertainty added by the metamodel, and its structure makes it
rigid and not (re)usable for other cases.
- In terms of applications, optimisation is where most of the SMs are
currently used; nevertheless, there are still additional and more
complex objectives that can be optimised with the aid of MLSMs, for
instance, robustness and interventions under deep uncertainty.
- On case studies, the reviewed papers denote two main issues: first,
there is a lack of UDSs benchmarks, which hinders the development and
comparison of algorithms across studies, and second, these metamodels
still lack research on the changes of the response surface with the
increase in the complexity of the water system, especially for large
systems.
The following research directions are suggested to address the above key
gaps in knowledge:
- Regarding metamodeling methods, further research is required on
advanced metamodeling techniques that include: inductive bias,
robustness, and transferability. The notion of inductive bias allows
leveraging prior information to reduce the required training samples.
Examples of this bias include adding physical laws, coherence with
sensor data, or considering the underlying structure of the data –
space, time, or topology– In this regard, the recently developed GNNs
resemble the already existing architecture of the urban water networks
and offer the highest fit to the data in these systems. Furthermore,
the new approach for AI models is to focus on the robustness and
explainability of the models which offer insight into the applications
and opportunities for improvement in the actual systems. Moreover,
implementing the new architectures of ML as an SM would allow transfer
learning, which represents the ability to use pre-trained models and
save computational budget.
- On applications, additional efforts are encouraged in two areas in
which metamodels will increasingly be more required: uncertainty
analysis and multi-objective optimisation, especially when robustness
metrics are used as optimisation objectives. Further research is
required on other less developed applications, namely, real-time
predictions, state estimation, and to a lesser extent, LFPB
complements. These applications have been minimally explored and most
of them have only been used for a specific type of water network.
- Regarding case study type, it is crucial to develop benchmark UWNs,
especially of UDSs, and complex networks. This data will facilitate
training, testing, and comparing new metamodels. These new benchmarks
could incorporate information on leakages, demand patterns,
cyber-attacks, rainfall, or surveillance data as well as performance
metrics as reference points to compare performance.
Exploring the potential of MLSMs for approximating UWNs’ components and
correcting predictions with real data can lead to independent ML models
of the water networks that leverage the physical domain knowledge and
the measurements. New MLSMs are encouraged to leverage the inductive
bias offered by the increasing data to help UDS and WDS operators. The
new advancements in ML, especially GNNs, have great potential to advance
surrogate modelling in UWNs. Water network modellers can speed up
calculations for larger and more complex cases, being able to design
more robust and overall better urban water systems.