Chong Tian

and 4 more

Deep reinforcement learning has been used to establish real-time control of urban drainage system (UDS) for flooding mitigation in recent studies. However, only model-based reinforcement learning was under consideration, which means that a mathematical model of UDS is necessarily needed during RL’s training process. Although this is a natural way to establish RL system, it causes several problems, including (i) too much training time, (ii) too “rich” cache data, and (iii) too “perfect” training environment. To address these problems, a model-free RL training framework based on two Koopman emulators is provided and validated through simulation with respect to an UDS in a city located in eastern China. This framework achieves shorter training time and higher efficiency of data usage through the fast nonlinear emulation capability of Koopman emulators and the equalization between the dimension of emulator’s observable and RL’s state. Also, certain randomness is provided in RL training process through emulation. According to the results, compared with model-based RLs, this framework achieves a similar control effect with a 20 to 23 times faster training process and 79.67 times higher efficiency of data usage. The uncertainty analysis shows that slight perturbation which does not statistically change the control system in the training and testing process will not leverage the control effect of both model-based and model-free RLs. Meanwhile, the performances of the Koopman emulators of UDS are strongly related to their hyperparameters and the similarity between training data and test data.

WenChong Tian

and 4 more

Real-time control (RTC) has been proved an efficient ​tool in assisting combined sewer systems with their response to different rainfalls and enhance the performance of combined sewer overflow (CSO) and flooding reduction. Recently, a new RTC approach based on deep q learning is developed for flooding control in stormwater system. Although this work achieved a milestone of urban water management in the direction of smart city, some further steps are still worth exploring. For instance, the control effects of different kinds of RLs are unknown. Also, the safety and the performance of RLs still need further improvement. In this paper, three tasks are completed to address these problems. First, five individual RLs are used to design five RTC systems and compared with each other. Then, a hybrid RTC system, called Voting system, is developed based on the combination of multi-RLs and model predictive control for better safety. Meanwhile, a new RL training method, called q value improvement (QVI), is provided to improve the RLs’ performance. All the models are evaluated by simulating the real-time implementation using a SWMM model of a city in eastern China. According to the results: (i) All the five trained RLs show promise in CSO and flooding reduction with different control effect and trajectory. (ii) Voting selects a relatively safer control trajectory than a single RL, providing a guarantee of safety. (iii) The QVI improves the performance of RLs with the maximum improvement rate of 0.276431.