Transferring hydrologic data across continents -- leveraging US data to
improve hydrologic prediction in other countries
Abstract
There is a drastic geographic imbalance in available global streamflow
gauge and catchment property data, with additional large variations in
data characteristics, so that models calibrated in one region cannot
normally be migrated to another. Currently in these regions,
non-transferable machine learning models are habitually trained over
small local datasets. Here we show that transfer learning (TL), in the
sense of weights initialization and weights freezing, allows long
short-term memory (LSTM) streamflow models that were trained over the
Conterminous United States (CONUS, the source dataset) to be transferred
to catchments on other continents (the target regions), without the need
for extensive catchment attributes. We demonstrate this possibility for
regions where data are dense (664 basins in the UK), moderately dense
(49 basins in central Chile), and where data are scarce and only
globally-available attributes are available (5 basins in China). In both
China and Chile, the TL models significantly elevated model performance
compared to locally-trained models. The benefits of TL increased with
the amount of available data in the source dataset, but even 50-100
basins from the CONUS dataset provided significant value for TL. The
benefits of TL were greater than pre-training LSTM using the outputs
from an uncalibrated hydrologic model. These results suggest hydrologic
data around the world have commonalities which could be leveraged by
deep learning, and significant synergies can be had with a simple
modification of the currently predominant workflows, greatly expanding
the reach of existing big data. Finally, this work diversified existing
global streamflow benchmarks.