Flood Defense Standard Estimation Using Machine Learning and Its
Representation in Large-Scale Flood Hazard Modeling
Abstract
We propose a machine learning-based approach to estimate the flood
defense standard (FDS) for ungauged sites. We adopted random forest
regression (RFR) to characterize the relationship between the observed
FDS and ten explanatory factors contained in publicly available
datasets. We compared RFR with multiple linear regression (MLR) and
demonstrated the proposed approach in the conterminous United States
(CONUS) and England, respectively. The results showed the following: (1)
RFR performed better than MLR, with a Nash–Sutcliffe efficiency (NSE)
of 0.82 in the CONUS and 0.73 in England. A negative NSE when using MLR
indicated that the relationship between the FDS and each explanatory
factor did not obey an explicit linear function. (2) River flood factors
had higher importance than physical and socio-economic factors in the
FDS estimation. The proposed approach achieved the highest performance
using all factors for prediction and could not provide satisfactory
predictions (NSE < 0.6) using physical or socio-economic
factors individually. (3) We estimated the FDS for all ungauged sites in
the CONUS and England. Approximately 80% and 29% of sites were
identified as high or highest standard (> 100-year return
period) in the CONUS and England, respectively. (4) We incorporated the
estimated FDS in large-scale flood modeling and compared the model
results with official flood hazard maps in three case studies. We
identified obvious overestimations in protected areas when flood
defenses were not taken into account; and flood defenses were
successfully represented using the proposed approach.