From hydrometeorology to water quality: can a deep learning model learn
the dynamics of dissolved oxygen at the continental scale?
Abstract
Dissolved oxygen (DO) sustains aquatic life and is an essential water
quality measure. Our capabilities of forecasting DO levels, however,
remain elusive. Unlike the increasingly intensive earth surface and
hydroclimatic data, water quality data often have large temporal gaps
and sparse areal coverage. Here we ask the question: can a Long
Short-Term Memory (LSTM) deep learning model learn the spatio-temporal
dynamics of stream DO from intensive hydroclimatic and sparse DO
observations at the continental scale? That is, can the model harvest
the power of big hydroclimatic data and use them for water quality
forecasting? Here we used data from CAMELS-chem, a new dataset that
includes sparse DO concentrations from 236 minimally-disturbed
watersheds. The trained model can generally learn the theory of DO
solubility under specific temperature, pressure, and salinity
conditions. It captures the bulk variability and seasonality of DO and
exhibits the potential of forecasting water quality in ungauged basins
without training data. It however often misses concentration peaks and
troughs where DO level depends on complex biogeochemical processes. The
model surprisingly does not perform better where data are more
intensive. It performs better in basins with low streamflow variations,
low DO variability, high runoff-ratio (> 0.45), and
precipitation peaks in winter. This work suggests that more frequent
data collection in anticipated DO peak and trough conditions are
essential to help overcome the issue of sparse data, an outstanding
challenge in the water quality community.