Figure 2. Case study type distribution for Water Distribution Systems (WDS) and Urban Drainage Systems (UDS)
On UDSs, in terms of size, most of the papers do not report the number of pipes. Consequently, the extent of the system was often assessed by the reported area. This suggests that when MLSMs are used, the water network is set aside and only the relation input-output is considered. The extent of the case study (number of pipes or area) is a proxy of the complexity of the case studies which is the relevant dimension. Nevertheless, some applications can involve medium-sized networks but with high complexity (e.g., different control elements, multiple objectives, changing scenarios, among others). Besides the particular characteristics of each network and application, the metamodeling process was the same regardless of the size of the network. However, the required time for creating the database and training the model increases with the complexity of the case study. So far, the procedure does not vary as a function of the complexity of the case study; nonetheless, considering modifications to the training process or the metamodels based on the complexity of the case study could yield better approximations to the RSs.
Since each system has a different area and number of pipes, we proposed the categorization in Table 1. The ratio between the number of small networks and the rest is noticeably bigger in WDSs than in UDSs due to the use of benchmarks to test the methodologies. Even though the use of metamodels is justified in larger networks, its use decreases as the size increases.
3.3. Metamodelling Methods
Regardless of the water network type and metamodel applications, the preferred method for metamodeling is the ANN. ANNs are computational models based on the complex interaction of multiple individual components (i.e., units or neurons). Each unit performs the same procedure: receiving information, executing an operation (usually a linear transformation of the inputs), applying a non-linear transformation to the result (e.g., hyperbolic tangent, sigmoid, rectified linear unit), and sending the information to the next connected units. Each of the units has trainable parameters that determine the relative weight of each of the inputs. Units are arranged in layers; each ANN has at least one input layer and one output layer, where the inputs are presented to the network and the computed outputs are collected, respectively. Between these layers, there are one or more hidden layers, where most of the information processing takes place. ANNs learn to approximate the input-output relationships in the data by tuning the trainable parameters (i.e., unit’s weights and biases) during the backpropagation learning process, which is usually carried via gradient descent and by computing the partial derivatives of the hidden layers using the chain rule of derivation. For a complete review of ANNs, the reader is redirected to Goodfellow et al. (2016) for a general resource and Shen (2018) for a specific review for water resources scientists.
The analysis of the literature shows that the MultiLayer Perceptron (MLP) is the most widely used MLSM. The MLP is a specific ANN architecture that consists of a series of layers in which all the units of a layer are connected to all the neurons in the previous and next layer; hence it is also known as the fully connected ANN. Most of the reviewed studies in this paper used this architecture with one hidden layer; mainly due to its simplicity, high speed, and accuracy. Still, the ANNs can be customized to increase the accuracy of certain applications. This practice of creating deep networks, i.e., with more layers and units per layer, is part of modern deep learning (Goodfellow et al., 2016).
In WDSs, there are two cases of variations on the number of layers: Sayers et al. (2019) used two hidden layers for optimisation of design while Yoon et al. (2020) used 15 layers in their ANN to estimate the network performance after earthquake events. Deep networks may increase performance but they are more prone to overfitting, and require more training time and examples. Also, it is not possible to know the number of layers and units that yield the best performance. For example, Modesto De Souza et al., (2021) tested multiple architectures of an MLP for pressure estimation in a WDS. Their results suggest that the optimal number of layers is two but this can vary for other applications. On the other hand, UDSs present more variation on the implemented MLPs including varying the number of hidden layers (Berkhahn et al., 2019; Kim & Han, 2020; Raei et al., 2019), changing the activation function to a radial basis function (She & You, 2019; Vojinovic et al., 2003), and adding fuzzy logic (Keum et al., 2020).
As previously stated, MLPs are the most popular MLSM. This is not surprising due to its ease of implementation and success in multiple applications, as well as hype from the AI community. However, the MLP, and in general, the ML methods present several drawbacks. As Razavi et al. (2012a) indicated in their numerical assessment of metamodelling strategies in computationally intensive optimisation, “the likelihood that a metamodel-enabled optimizer outperforms an optimizer without metamodelling is higher when a very limited computational budget is available; however, this is not the case when the metamodel is a neural network. In other words, neural networks are severely handicapped in limited computational budgets, as their effective training typically requires a relatively large set of design sites, and thus are not recommended for use in these situations.”. Therefore, the use of an ANN may even harm the development of an application. In that same work, the authors show that there are cases for which it is better to not use a metamodel and go with the original model instead. Consequently, they recommend further research on determining where it is worth pursuing a metamodeling approach. In recent years, the widespread availability of parallel computing (e.g., cloud computing and graphics processing unit) and user-friendly Deep Learning libraries, such as Pytorch (Paszke et al., 2019), have largely reduced this problem.
Even though using MLPs is the most popular choice from the set of ML tools, it is not the only one. For example, Pasha & Lansey, (2014) used support vector machines (SVMs) for improving the real-time estimation of water tank levels and thus decreasing pump energy consumption in a WDS. In UDSs, Chiang et al. (2010) implemented an early form of recurrent neural network (RNN) for water level predictions at gauged and ungauged sites. According to the authors, their decision of using this architecture was motivated by its increase in performance. However, the main disadvantages of this architecture lies in training difficulty (Pascanu et al., 2013) and computational costs (Strubell et al., 2020).
Similarly, Kim et al. (2019) and She & You (2019) leveraged the time structure in rainfall time series for real-time flood prediction with a nonlinear autoregressive network with exogenous inputs (NARX) neural networks. This architecture is a feedforward ANN that calculates the next value of a time series as a function of both past input and output values. In each study, the authors tailored the model to the conditions of their problem. Kim et al. (2019) added a second verification step to account for values that incur serious inundation damage and She & You (2019) implemented a NARX neural network for the monotonic parts of a hydrograph (i.e., ascending and descending stages) and a radial basis function MLP for the non-monotonic interval (i.e., around the peak).
3.3.1 Metamodel inputs and outputs
The inputs to the metamodels in UWN applications are usually decision and explanatory variables while the outputs can vary based on the scope of the problem. Based on the inputs used in the reviewed papers, there is not a single consistent variable across the different applications in any of the water networks; they are problem-specific. For example, flood prediction in UDSs relies on rainfall time series, while the design of WDSs relies on inputs such as pipe diameters and chlorine rating doses. On the other hand, the output of the metamodels are usually state variables of the UWN or performance metrics. For example, a metamodel can be developed to estimate a pressure-dependent metric, such as the resilience Network Resilience Index (NRI) (Prasad & Park, 2004), or it can output the pressures in a WDS, used to compute the NRI. Other examples of surrogated components are water level in storage units or pump energy consumption. Other examples of overall metrics are sampling accuracy (Behzadian et al., 2009), the economic cost of interventions, greenhouse gases, reliability, and vulnerability (Beh et al., 2017).
Determining the output and scope of the metamodel entails deciding if the metamodel should emulate the model or one of the objectives computed after the hydraulic simulation. The reader is referred to Broad et al. (2015) for a complete methodology about metamodel scope for risk-based optimisation and its application to WDS design. In contrast, there are no applications for objective approximation using MLSMs in UDS.
By inspecting the dimensions (i.e., number) of the inputs and outputs, a converging trend is visible: the number of inputs is higher than the number of outputs. This is no surprise since most of the studies estimate one or two target values that summarize the desired state of the network (e.g., overall performance, minimum chlorine concentration, total flooding volume) with multiple decision and state variables. Nevertheless, some authors have used fewer variables to produce more outputs. For example, in WDSs, Lima et al. (2018) and Meirelles et al. (2017) estimated 118 pressure nodes with only known pressure at 3 nodes, while Kim et al. (2019) predicted urban floods in multiple nodes with a single rainfall time series.
On the dimensionality of ANNs, having multiple inputs and outputs allows accounting for more complexity in the applications; nonetheless, they both come with downsides. For the input dimensions, Razavi et al. (2012b) argue against using a large number of explanatory variables (>20) since the minimum number of training examples can be excessively large. On the other side of the model, the number of output variables also is recommended to be low. In theory, the number of output variables is not restricted; moreover, it is one advantage of ANNs over other RS metamodels as they can act as multi-output emulators. However, an ANN with multiple outputs will seek to find a compromise between the errors of all the outputs, which might hurt the overall accuracy of the MLSM. For this reason, an alternative approach is to train an ANN for each output variable. Since each objective has a metamodel, the accuracy increases but also does the training time. As noted by Andrade et al. (2016), considering one multi-output ANN or multiple ANNs with single output depends on the problem at hand. The size of the water network is the most important factor since, for small systems, the results with one or multiple ANNs are equivalent in performance. In addition, the choice of one model or the other should consider desired accuracy, available metamodeling time, and required speed of execution.
3.3.2 Metamodel Performance
Regarding the performance of a metamodel, the most important characteristics are computational speed and prediction accuracy. The computational saving is reported as a reduction of the time that the application would have taken by running the original model. This quantity was reported by nearly half of the reviewed studies and it was on average higher than 90%, most of the time over 98%. This is a satisfactory indication since the purpose of these SMs is to reduce the computational burden of intensive applications. Nonetheless, around half of the studies did not report this saving. Although quantifying the computational saving is not always easy, it is recommended for future researchers who use a metamodel to consider such an estimate. Since the design and training time could be longer than the expected saved time, having an estimate of the potential saving aids in the decision of making a metamodel.
In terms of prediction accuracy, there are multiple indicators used by the researchers to assess the fidelity of the ML algorithm to the original model. These common metrics include root mean squared error (RMSE), Nash-Sutcliffe efficiency coefficient (NSE), mean absolute error (MAE), and Pearson correlation coefficient. This multitude of metrics hinders a straight comparison between models or applications, but overall it is possible to observe good fittings between the metamodel and the original model. It is worth noticing that the metamodel will reflect reality as much as the original model is capable of doing so. Metamodels are second-level abstractions and therefore may only be as good as the original model in terms of accuracy.
In addition to the previously mentioned criteria, Razavi et al. (2012b) include development time, and Asher et al. (2015) add surrogate-introduced uncertainty as assessment metrics. For these criteria, seven of the reviewed papers calculated or referred to the time it took to train the models and only five performed an analysis on the metamodels’ robustness. Given the versatility and multipurpose nature of the SMs, there are other performance indicators, e.g., ease of development, explainability, generalization, or re-trainability. Along these lines, the reviewed papers disregard these indicators since the development of the metamodel is specific for each case study and the implementation goes unnoticed. These indicators are secondary in comparison to computational saving and accuracy. Both metrics constitute the most relevant metrics used in the literature, including this review.
4 Current issues in metamodelling
Based on the current status presented in the previous section the following issues were identified.
4.1. Basic applications
MLSMs have been used to tackle various issues, namely, optimisation, uncertainty analyses, real-time applications, state forecast, and aiding LFPB metamodels. Although these generally addressed relevant problems, each of the reviewed papers had a basic framing, i.e., the inputs deal with few design or input variables (e.g., diameters, chlorine dosage, accumulated rainfall) and the outputs are usually summary variables (e.g., critical pressure, chlorine residual, flood volume). This approach is comprehensible for several reasons. First, most of the time the simplifications still retain sufficient problem information to find an adequate solution. Second, it avoids problems related to high dimensionality in the inputs and outputs. Lastly, it allows researchers to introduce their metamodeling method without interference from excessive complexity.
Although these frames are effective, they could result simplistic for the complexity of water networks. Considering a small set of interventions may discard types and combinations of interventions (e.g., allowing not only for change in diameters but also adding pumps or doing both at the same time). Furthermore, other changes in the network or their components, or even interactions with other city systems could be explored. However, these are rarely considered since they represent a challenge for traditional RS metamodels; current MLSMs are very specific to the cases in which they are trained on. Because of this, new approaches are required, mainly in optimisation and uncertainty analysis.
As seen in section 3, the most popular application for MLSMs is optimisation. In this application, multiple authors (Beh et al., 2017; Doorn, 2021; Kapelan et al., 2005; Razavi et al., 2021) have remarked on the importance of considering new objectives. For example, robustness for designing water systems, especially under deep uncertainty, requires considering multiple scenarios for which is not possible to assign a probability or ranking. This analysis is desirable because water networks are systems with long lifespans of service. Nonetheless, objectives like robustness tend to be more computationally intensive; therefore, their need for metamodels increases.
A relevant missing layer of complexity is uncertainty analysis, especially for UDSs. The current practice to design the system is to use a single benchmark storm and assume it is representative of the future rain events the system will face. However, two UDSs with similar performance during a design event could behave very differently for other rainfall patterns. According to Ng et al. (2020), the final design considering a single strong storm does not guarantee optimal performance during long mild storms and for a succession of frequent small events. Naturally, the authors recognize that performing a design considering multiple events would increase the computational effort but also suggest the implementation of SMs for dealing with this difficulty.
4.2 Case studies: Lack of benchmarking with complex networks
Benchmark water networks are open access datasets that contain the necessary information to create models of a system. It consists of the topology of the network, its components, and depending on the system it could incorporate leakages, demand patterns, cyber-attacks, rainfall, or surveillance data. Benchmarks are used as reference points to compare the performance of models and algorithms. Here, it is necessary to distinguish between synthetic and real data. Even though the synthetic data allow to implement and compare algorithms, they may not reflect all the processes that real data can account for.
There is a clear difference between types of infrastructure in the number of used networks since benchmark networks in UDSs are not as available as in WDSs. In water distribution, there is a set of water networks called Water Distribution System Research database. The ASCE Task Committee on Research Databases for WDS created this database which is hosted by the University of Kentucky (2013). There are benchmarks for multiple problems in categories such as network expansion, operation, and design. This allows modellers to easily obtain data for the development and comparison of algorithms in networks of different sizes. On the other hand, there is no consolidated set of benchmark networks for UDSs, let alone an entire structured database. This is attributable to factors such as the difficulty of taking measurements in sewer environments and, according to Pedersen et al. (2021), the little interest of utility companies in making the datasets publicly available. Consequently, all the applications on UDSs were entirely developed for real cases, which is positive for the bridging between the theoretical approaches and the practice, but hampers the development of algorithms on the systems, due to the difficulty of comparison and the process of accounting for particularities of each system.
Regarding the size of the case studies, most of the systems in which the MLSMs were used were medium or small. Metamodels are most useful in problems with large computational times, that is, in applications with large water networks. In the case of WDSs, a common practice to test the effectiveness of a method is developing a metamodel for a small benchmark network and then using the same steps for creating a metamodel in a big real case. Even though this practice is reasonable, it assumes the response surface of both networks is comparable or similar. However, this is not necessarily the case as reported by Andrade et al. (2016) who noted contrasting accuracies between big and small case studies when training metamodels. Exploring solution spaces is already an issue when using metamodels, independent of the network, as reported by Broad et al. (2005), but large networks represent additional challenges that increase in complexity in a non-linear manner.
4.3 Machine learning and multi-layer perceptron limitations
Although the MLP is not the only ML technique, it is the most popular one among MLSMs. Given that its structure allows it to address multiple types of problems, it has become a one-size-fits-all model. Nevertheless, it presents multiple issues, namely, the curse of dimensionality, black-box nature, and rigid structure. These three shortcoming respectively 1) hinder their use for high dimensionality problems, 2) limit confidence in their approximations, and 3) prevent the transferability of trained models across different case studies.
4.3.1 Curse of dimensionality - Metamodeling time
The curse of dimensionality indicates that for a certain level of accuracy, there is an exponential increase in the required amount of data as the dimensions of a problem increase (Keogh & Mueen, 2017). Naturally, this problem can be addressed by reducing the number of input dimensions (i.e., fewer explanatory variables) using prioritization based on experience, knowledge of the task, or some automatic procedure such as principal component analysis (PCA). However, as noted by Maier et al. (2014), for real-world problems reducing the number of input features may not be a satisfactory solution because it usually leads to an approximation that could exclude optimal zones and prevent the algorithms to find optimal solutions. Given this situation, searching for solutions on the algorithmic side may yield better answers.
The SMs have worked adequately so far but future metamodels are likely to increase in complexity. This is either due to an increase in the complexity of UWNs or an increase in the number of input (more design choices/explanatory variables) or output (more objectives) dimensions. Both drivers increase the size of the metamodels and consequently the number of training examples. Since the original models are already expensive to run, creating a large training dataset might be unfeasible in the first place. The metamodeling time would become the obstacle. This time is usually disregarded since some authors consider it not relevant compared to the posterior computational gain in the application. Nevertheless, this time is important in high dimensional search spaces, as noted by Razavi et al. (2012b), since the number of design samples required to train the metamodel could be already prohibitively large.
4.3.2 Black box nature - Deterministic and obscure outputs
Two of the most recurrent criticisms of ML models are their lack of uncertainty estimation and the lack of their transparency, i.e. little or no ability to explain the results they obtain. Both are overlooked aspects of metamodeling in the context of UWNs. The MLSMs return a unique answer without uncertainty bands or possibilities to explain the combination of inputs that drove to the final outputs. For SMs, these issues are not major concerns; nevertheless, their inclusion aids the applications in which the SMs are used.
Regarding uncertainty estimation, a few papers (Raei et al., 2019; Rosin et al., 2021; She & You, 2019; W. Zhang et al., 2019) estimated the effect of including a metamodel in their respective application. Not accounting for this uncertainty can lead to bad approximations of the actual response surface and suboptimal or unfeasible solutions. Authors have dealt with this difficulty by performing sensitivity analysis (e.g., Raei et al., 2019) or training multiple models in parallel with slightly different datasets and averaging the outputs of the models. For example, Rosin et al. (2021) developed a committee of ANNs with this approach. However, this analysis requires extra considerations which may increase the metamodeling time. Some guidelines have been given for the pre-treatment (Broad et al., 2015) and post-treatment (Broad et al., 2005a) of these SMs but there is still a lack of focus on improving the management of uncertainty during treatment, i.e., developing a model that directly considers uncertainty. Algorithms in the branch of robust ML may contribute to aid in the direct incorporation of metamodel uncertainty quantification whether it comes from the data (Wong & Kolter, 2019) or the model (Loquercio et al., 2020) .
Although robust learning allows estimating the uncertainty of a result, it cannot explain why. This is the area of explainable ML. For water networks’ SMs, being able to explain the results would help to understand the relationship between the decision variables and the objective function for the particular network that is being surrogated. For example, understanding which pipes (or a combination of them) play a key role in the resilience or flooding in a water network. There is a growing interest in the AI community towards explainable models to gain insights (Bhatt et al., 2020), ensure scientific value (Roscher et al., 2020), and develop trust in the outcomes of ML models (Dosilovic et al., 2018).
4.3.3. Rigid architecture - Specific case use
One disadvantage of MLSMs is the high degree of specialization in the trained metamodel. As seen before, these metamodels achieve high accuracies in the data for which they were trained. However, once they are trained, they become specific and rigid. Their structure limits its use for other tasks in the same system or similar applications in other water networks. The metamodel can be run several times on the same water network but doing the same operation in a different system requires a new metamodel, which should be trained from scratch. This is not desirable since the training process could consume most of the computational budget, especially in large case studies.
One solution is to leverage the training process of other models with transfer learning to decrease the number of examples to train a new model. Situations for which transfer learning is desirable are changes in the water network composition, similar system metamodeling, and change in the behaviour of the surrogated system. Changing components of the system accounts for scenarios when components (e.g., pipes, pumps, or tanks) are added to or removed from the system. Even though the system changes, it is still related enough to leverage a pre-trained model on that water network. In a similar way, two networks can share enough resemblance (e.g., a subsystem of another network, two skeletonized networks, or two networks with similar topology or geography) that it makes sense to use an SM from one as a pre-trained SM for the other. Lastly, when the system changes and the metamodel no longer applies is a challenge, also known as concept drift, that can be addressed using transfer learning. Here the two related water networks are the same but in two different periods.
4.4. Gaps in Knowledge
Based on the above critical analyses of metamodels and the issues identified the following key gaps in knowledge are summarised here:
1. Lack of depth on optimisation of complex objectives and uncertainty analysis for water networks using MLSMs. There are still additional and more complex objectives that can be optimised with the aid of MLSMs, for instance, robustness and interventions under deep uncertainty.
2. Lack of benchmark water networks, especially for UDSs and complex cases. First, this hinders the development and comparison of algorithms across studies, and second, these metamodels still lack research on the changes of the response surface with the increase in the complexity of the water system, especially for large systems
3. Current MLSMs’ limitations prevent advanced metamodeling applications. MLSMs can easily grow in size when the complexity of the response surface increases, most of the applications do not consider the uncertainty added by the metamodel, and its structure makes it rigid and not (re)usable for other cases.
5 Research directions
Based on the identified gaps, three main lines for future research are suggested. They consider the current and future needs in applications on UWNs as well as the potential of MLSMs to meet them.
5.1 Advanced applications
The current needs for adaptable water infrastructure are based on drivers such as growing demographics, urbanization, and climate change. As indicated in the UN-Water report “Water and Climate Change”, taking adaptation and mitigation measures benefits water resources management and improves the provision of water supply and sanitation services. In addition, it contributes to combat both causes and impacts of climate change while contributing to meeting several of the Sustainable Development Goals (UNESCO, 2020). In UWNs, multi-objective optimisation and uncertainty analysis play a key role in the search for adaptation measures and decision making, and MLSMs can help improve and accelerate their implementation.
Optimisation applications will increase in the number and complexity of the inputs and outputs. Increasing the number of inputs, i.e., decision variables and design interventions (e.g., nature-based solutions), allows to explore more alternatives, consider uncertainty, or assess multiple scenarios. On the other hand, the output of the optimisation is leaning towards complex objectives such as multi-objective robustness (e.g., Kasprzyk et al., 2013), multiple technical performance metrics (e.g., Fu et al., 2013), pro-active maintenance (Kumar et al., 2018), complex water quality indicators (Jia et al., 2021), and human values (Doorn, 2021). Multi-objective optimisation allows identifying solutions balancing trade-offs among objectives, for instance, cost and resilience (Wang et al., 2015). Naturally, when considering more objectives, the computational load increases, especially when those objectives are computationally expensive (e.g., robustness). In previous phases of research on optimisation, metamodels were seen as an aid, but as optimisation gradually evolves to consider additional and more complex objectives, metamodels become indispensable (e.g., Beh et al., 2017).
Regarding uncertainty analysis, it is necessary to have fast, reliable, and flexible metamodels that can adapt to the multiple conditions in which the systems are evaluated and under multiple criteria. Traditionally, simplified models have been preferred for this task; however, RS metamodels become appealing alternatives when dealing with more complex objective functions and original models. Metamodels should play a key role in the development of frameworks for robustness-driven design. This application has major implications for UDSs, since no MLSM study focused on uncertainty analysis, even when the evidence suggests the criteria for the design of these systems is not necessarily robust (Ng et al., 2020). Although uncertainty analysis entails an intrinsic increase in the computational effort, the benefits they bring outweigh the challenges it represents. According to the IPCC (2021b), UDSs are expected to receive more intense rainfall events based on climatic projections but considerable uncertainty remains.
The community should further research combined RS-LPFB applications, to further integrate MLSMs with physically-based models for accelerating the underlying hydrodynamic engines. Likewise, physically-based models could be hybridized by incorporating an ML model that corrects the outputs of the original model for higher accuracy accounting for the real behaviour of the system. Looking ahead, ML algorithms could detach from the physically-based model and replace its functioning with a cheaper version to run based on increasingly available real-world data (e.g., digital twins for UWNs (IWA, 2021)).
5.2 Benchmarking and large network behaviour
The lack of benchmark models is a gap that was already identified by Maier et al. (2014) who set the characteristics and recommendations of valuable benchmarks, including non-trivial real-world problems with a representative range of decision problems characteristic of the water systems. The review shows that UDSs lack such benchmarks. To overcome this issue, we recommended to implement a similar approach to that of the Kentucky database, with applications such as real-time control, outflow, and flood prediction. For WDSs, it is appropriate to enlarge the current databases to account for new objectives, interventions, performance metrics, and real case examples. Regarding metamodels, the benchmarks should also include a reference model to compare computational saving and accuracy, with suggested performance metrics, such as NSE, RMSE, or the number of model executions.
As Goodfellow et al. (2016) indicate, having benchmark databases with real cases is one of the reasons why deep learning has recently become a crucial technology in several disciplines. In AI, datasets went from hundreds or thousands of examples in the early 1980s up to datasets with millions of examples after 2010. Nowadays, thanks to the increase in connectivity and digitalization of our society, a large amount of ML algorithms can be fed with the information they require to achieve high accuracy. Since the ML and DL models are dependent on their training sets, their success goes hand in hand with the size and quality of available datasets, preferable with real information. The UWNs’ research community is moving the first steps in this direction. One example concerns the UDS of the Bellinge dataset (Pedersen et al., 2021), a suburb to the city of Odense, Denmark that is now available for “independent testing and replication of results from future scientific developments and innovation within urban hydrology and urban drainage system research”. This dataset includes 10 years of asset data (information from manholes and links), sensor data (level, flow, and power meters), rain data, hydrodynamic models (MIKE urban and EPA SWMM), and other information. Similar examples are needed to enable the exploration of metamodels’ responses in networks of different characteristics (e.g., size, connectivity, slope).
As for the size of the networks, further research is required to assess the response surface of large networks. Specifically, new benchmark datasets should also include complex network cases for their study. These can be large networks or medium-size cases with high complexity. Considering that the larger the network the higher the required time to generate and use the training data, significant efforts are required on this matter. Metamodels could aid in reducing the computational times that obstruct studying the response surface of large and complex systems. Nonetheless, new metamodels are required to account for the complexity of these cases and use as few training scenarios as possible.
5.3 Unexplored advanced metamodeling technologies
ML is the area with the highest growth in academic output in recent years. However, the field of MLSMs for UWNs has not yet considered the new tools and algorithms recently developed by researchers in fundamental AI or other applied disciplines. These advancements include DL architectures that express assumptions of the data in the ANNs for robust, interpretable, and transferrable models. This new wave of AI formalizes the attempts to add knowledge about modelled processes as well as extract knowledge from the results.
5.3.1 Inductive bias – Deep learning: Graph Neural Networks
The curse of dimensionality can be addressed by including inductive biases. Following the work of Battaglia et al. (2018), we define the inductive bias as the “expression of assumptions about either the data-generating process or the space of solutions”. Inductive bias can be seen as well in the architecture of the model by leveraging the inner structure of the data, which could be spatial, temporal, or relational. Exploiting the structural information of the data can reduce the number of parameters, and consequently the required training examples by parameter sharing and sparsity of connections. The data structure gives information about the similarity of the data points in a relevant dimension (e.g., distance, time, connection). In that sense, similar data can be treated analogously (parameter sharing) and dissimilar data can remain unrelated (sparse connectivity).
Inductive bias nudges a learning algorithm to prioritize some solutions over others. This allows finding high-performing solutions more easily than when it is not considered. Ideally, involving inductive bias improves the search for solutions without compromising the performance, as long as the right inductive bias is chosen; otherwise, it can lead to suboptimal performance (Battaglia et al., 2018). For example, when surrogating the pressure at the nodes of a WDS with a neural network (e.g., Broad et al., 2005; Meirelles et al., 2017) there are multiple metamodel solutions, i.e., architectures with specific parameter values that can approximate the response surface described by the training data. Nevertheless, when adding inductive bias, the set of possible solutions shrinks to a subset of solutions that comply with predefined characteristics, for example, having graph structure, following physical laws, or agreeing with measurements.
The most common components in DL are fully connected, convolutional, recurrent, and, more recently, graph layers. The fully connected layers have a weak inductive bias, while each of the remaining exploits some relation or invariance in the data. The convolutional layers typical of convolutional neural networks (CNNs) leverage the regular structures in grids, such as images, and connects information according to Euclidean closeness. Recurrent neural networks (RNNs) consist of recurrent units which consecutively process data sequences, such as time series, and connects information according to sequential similarity. On the other hand, graph neural networks (GNNs) extend DL methods to non-Euclidean data, such as graphs, where entities are connected by relations or, in graph terminology, nodes connected by edges.
Given their relational inductive bias, GNNs are the most suitable DL architecture for applications in UWNs, since the natural structure of these systems is a graph. Researchers have already exploited graph theoretical concepts to develop decomposition models of WDNs (Deuerlein, 2008), assess the resilience of sectorized WDNs (Herrera et al., 2016), as well as identifying critical elements in UWNs (Meijer et al., 2018, 2020). Furthermore, there are already some applications of GNNs in UWNs. In WDSs, Tsiami & Makropoulos, (2021) employed this architecture for cyber-physical attack detection using a graph created from sensors in the water system. In UDSs, Belghaddar et al. (2021) applied this method to database completion of wastewater networks.
This architecture operates on the graph domain, which allows it to leverage the pre-existing network topology of the data. This architecture has gained considerable attention in the last years due to its ability to include relational structure from connected entities. Even though GNNs’ outputs continue to be hardly explainable, there are efforts to generate explanations of their outputs, e.g., GNNExplainer (Ying et al., 2019). As noted by Battaglia et al., (2018), “the entities and relations that GNNs operate over often correspond to things that humans understand (such as physical objects), thus supporting more interpretable analysis and visualization”. In this way, GNNs are not entirely explainable but they are more explainable than other DL architectures.
It is also possible to use combinations of layers in problems that contain more than one structure such as in the case of UWNs, which have temporal, spatial, and topological variability. An example of the application of these graph models in a civil infrastructure was developed by Sun et al. (2020) who included the spatial and temporal relations in a road network for traffic forecasting. This infrastructure has multiple parallels with UWNs, including its graph connectivity, spatial-temporal variability, and human interaction. Another similar infrastructure with more examples can be found in power systems for which GNNs have been used in key applications such as fault scenario application, time series prediction, power flow calculation, and data generation (Liao et al., 2021). For a review in depth of GNN architecture, the reader is referred to Zhou et al. (2018).
This architecture presents an opportunity to leverage the present structure of the data generated in the UWNs to decrease the number of parameters and consequently the required training data; which enables creating SMs of larger networks and many and more complex objectives. By conditioning the characteristics of the solutions, the metamodels gain the possibility to generalize to similar cases. For example, pipe changes in a network configuration could be better represented with a GNN-based metamodel. This GNN SM could be able to adjust itself without modifying the underlying structure, which would probably be required in the case of other metamodels that do not consider this inductive bias.
5.3.2 Third wave of Artificial Intelligence
The US Defense Advanced Research Projects Agency (DARPA, 2016) separates the different phases of AI into three waves. The first wave refers to the past approaches and the birth of AI, the second wave is the current and popular phase of high-performing black boxes, and lastly, the third wave is proposed for the future of AI with models leaning towards robustness and explainability.
Robustness refers to the ability to include uncertainty in the calculation of the outputs of a model, in this way the user not only receives a deterministic answer but a range of possible values, usually represented by an expected value (e.g., mean) and a measure of uncertainty (e.g., variance). According to Gawlikowski et al. (2021), methods for estimating uncertainty in ANNs can be split into four types: single deterministic methods, bayesian methods, ensemble methods, and test-time augmentation methods. Each of these lines offers an estimation of the degree to which the neural network is certain of the output. This aspect is relevant when quantifying how likely it is for the metamodel to detach from the response surface which may cause, depending on the application, to omit optimal solutions, miss outflows, or underestimate floods. Recommended methods for implementation on MLSMs include Bayesian neural networks (e.g., Zhu & Zabaras, 2018) or single deterministic methods, the latter is recommended based on the low additional computational burden they include.
Research in explainability has also gained popularity in recent years. In the case of MLSMs, having an explainable model would allow us to better understand the response surface of the original model or the solution space. An improved comprehension of the response surface would facilitate obtaining a better insight on the behaviour of different algorithms (e.g., evolutionary methods); ultimately, contributing to what type of heuristic is best suitable in each application in water network which is a topic in which we have still very little understanding of (Maier et al., 2014). On the other hand, solution space explanation would allow gaining insight about which and components in the real system affect its performance, but most importantly, how they affect it. This could drive the interventions in the physical water network to improve its performance. Recommended models for implementation in this category are GNNs, as already reported by Tsiami & Makropoulos (2021), who were able to perform a removal analysis to quantify the contribution of each considered component (e.g., valves, tanks, and pumps) of the physical water network to the model’s performance. Since GNNs’ structure resemble the underlying system, it is possible to relate events on the metamodel to the actual system.
5.3.3 Transferrable AI models
The reviewed studies in this paper presented a methodology for training a metamodel to surrogate a computationally expensive model. Although the methodology is transferrable, meaning the steps can be followed and repeated to obtain a similar metamodel in another case study, the metamodel itself cannot be transferred to a new case study. This implies that all the metamodeling time spent on training is specific for every case. Through transferrable models, the authors may develop not only methodologies but also pre-trained SMs, which can be adapted to other cases lowering the amount of training needed for this new network.
Having a transferrable model would allow training the metamodel with data not only from the case study at hand but also from other, real and synthetic cases. For example, the benchmark datasets discussed previously. This increase in available information to train on is expected to improve the performance of the metamodel or even allow it to exist for cases in which data is scarce, for example, very computationally expensive UWNs in which training examples are costly. Once again, inductive bias plays a role, since the assumptions added to the algorithm delimit a smaller solution space, the ML models can be used as pre-trained solutions for other tasks. In the AI domain, this practice is referred to as transfer learning. Transfer learning is mainly implemented for specialized deep learning methods, i.e., architectures with strong inductive bias. It has been successfully implemented for applications such as diagnosis of medical images using CNNs (Vogado et al., 2018), prediction of air pollutants using RNNs (Hang et al., 2020), and bioinformatics as well as social-network classification tasks with GNNs (Verma & Zhang, 2019), among others (Weiss et al., 2016).
For transferrable SMs in UWNs, GNNs seem to be the natural option based on the agreement between the structure of the real system and the inductive bias corresponding to the GNNs. In an analogous way that CNNs learn filters that are independent of the input (i.e., images), GNNs learn filters that can be used across cases (e.g., water networks). Adding the structure and physics to the metamodel allows including more domain knowledge in the ANN that improves generalization capabilities. A relevant example of a model like this is the mass conserving RNN for rainfall-runoff modelling developed by Hoedt et al. (2021) in which the parameters used in the model resemble the mass conservation principle, which increased the accuracy and improved the model’s interpretability. At the same time, transferability opens the door to new applications, such as online optimisation of interventions, by learning the effect of changes in the topology and components of the network.
Using physical information, such as the knowledge embedded in the hydrodynamic models, also allows generating hybrid and general models. These models allow bridging the best of two domains: physical-based and data-driven. On this, Vojinovic et al. (2003) indicated that “the major advantage of integrating both a deterministic (numerical) model and a stochastic (data-driven) model over using the stochastic data-driven model alone is that the already available deterministic model quality is exploited and improved, instead of starting from scratch and throwing away all knowledge.” Furthermore, combining the domain knowledge with transferable models opens the possibility of creating general models. This type of model detaches from the training set in which it was trained so that its predictions can be applied in unseen scenarios. Following this trend, Kratzert et al. (2019) developed a recurrent ANN trained on basins from a continental dataset using meteorological time series data and static catchment attributes, and they were able to outperform hydrological benchmark models calibrated on individual catchments. The analogous application in UWNs would be an ML-based hydrodynamic model trained on a set of distribution or drainage systems which can generalize to independent unknown water networks. Such “DeEPANET” or “DeepSWMM” models can be developed by leveraging the inductive bias of GNNs, and accounting for the time dimension with recurrent layers or by resorting to an encoder-decoder architecture (Du et al., 2020).
6 Conclusions
This work reviews the current state of the application of MLSMs in urban water networks and proposes promising forward directions based on recent and successful developments in ML.
In terms of purpose, the main uses of MLSM in UWNs are optimisation and real-time problems. Even though MLSM accelerate optimisation algorithms by increasing the speed of individual iterations, these algorithms have multiple disadvantages. The training process can be time-consuming and the required size of that dataset cannot be known a priori as it depends on the complexity of the input-output mapping. For case study type, the UWNs in which MLSMs are applied vary in size and type. For analysing the complexity of the case studies, we preffered to consider WDSs and UDSs separately. Regarding its use in WDSs, the papers follow a clear pattern: the development and trial are usually made in medium or small benchmark networks, and the posterior implementation of the metamodel is done in a large real network. On the other hand, UDSs do not count with applications on benchmark networks due to their lack of availability. In terms of the metamodel, except for some applications of SVMs or RNNs, the vast majority of applications used MLP as SM. This method has been successfully implemented due to its high accuracy and flexibility regarding the inputs and outputs that it can map. Nevertheless, the MLSMs present multiple drawbacks that may even harm the development of an application. It is advisable to consider if an MLSM is worthwhile before starting its training.
Based on the reviewed literature, the following issues and gaps in knowledge were identified in terms of limitations of existing MLSMs. These problems include limitations on the MLSMs, lack of depth in current applications, and insufficient benchmarking datasets.
The following research directions are suggested to address the above key gaps in knowledge:
Exploring the potential of MLSMs for approximating UWNs’ components and correcting predictions with real data can lead to independent ML models of the water networks that leverage the physical domain knowledge and the measurements. New MLSMs are encouraged to leverage the inductive bias offered by the increasing data to help UDS and WDS operators. The new advancements in ML, especially GNNs, have great potential to advance surrogate modelling in UWNs. Water network modellers can speed up calculations for larger and more complex cases, being able to design more robust and overall better urban water systems.