loading page

Combined Sewer Overflow and flooding Reduction through a Safe Real-Time Control based on Multi-Reinforcement Learning, Model Predictive Control, and q value improvement
  • +2
  • WenChong Tian,
  • Zhenliang Liao,
  • Guozheng Zhi,
  • Zhiyu Zhang,
  • XUAN WANG
WenChong Tian
Tongji University
Author Profile
Zhenliang Liao
Tongji University

Corresponding Author:[email protected]

Author Profile
Guozheng Zhi
Shanghai Urban Water Resources Development and Utilization National Engineering Center Co. L td.
Author Profile
Zhiyu Zhang
Tongji University
Author Profile
XUAN WANG
Tongji University
Author Profile

Abstract

Real-time control (RTC) has been proved an efficient ​tool in assisting combined sewer systems with their response to different rainfalls and enhance the performance of combined sewer overflow (CSO) and flooding reduction. Recently, a new RTC approach based on deep q learning is developed for flooding control in stormwater system. Although this work achieved a milestone of urban water management in the direction of smart city, some further steps are still worth exploring. For instance, the control effects of different kinds of RLs are unknown. Also, the safety and the performance of RLs still need further improvement. In this paper, three tasks are completed to address these problems. First, five individual RLs are used to design five RTC systems and compared with each other. Then, a hybrid RTC system, called Voting system, is developed based on the combination of multi-RLs and model predictive control for better safety. Meanwhile, a new RL training method, called q value improvement (QVI), is provided to improve the RLs’ performance. All the models are evaluated by simulating the real-time implementation using a SWMM model of a city in eastern China. According to the results: (i) All the five trained RLs show promise in CSO and flooding reduction with different control effect and trajectory. (ii) Voting selects a relatively safer control trajectory than a single RL, providing a guarantee of safety. (iii) The QVI improves the performance of RLs with the maximum improvement rate of 0.276431.