Abstract
Wireless sensor networks are increasingly important in monitoring water
quality changes. High frequency monitoring can be used to gather water
quality information, identify existing problems and improve water
quality management activities. However, missing data are unavoidable
because of network communication issues, sensor maintenance or failure.
Data interpolation is a process for constructing missing values based on
known data points. Though traditional methods like polynomial or linear
interpolation are widely used in sensor data pre-processing, there are
still many limitations. Firstly, current interpolation methods give poor
estimations when a continuous number of data within a period of time are
missing. Secondly, many interpolation methods reconstruct missing data
based on other parameters available at the same time step. When all the
data are missing, these methods cannot be used. In our work, we are
developing a sequence-to-sequence interpolation model (SIM) for
recovering missing data sequences in wireless sensor networks. SIM uses
the state-of-the-art sequence-to-sequence architecture. It consists of
two parts: an encoder that reads from the source water quality time
series data and a decoder that generates the missing data sequences. In
our design, Bi-directional Long Short Term Memory Network is used as the
encoder and decoder due to its capability in using both past and future
information for a given time. The attention mechanism is applied to make
the SIM focus on different parts of the input time series when
interpolating missing values at different time steps. We evaluated the
SIM by using time series data from Queensland government’s water quality
monitoring network. Compared to Seasonal-ARIMA, the SIM reduced 23.2%
MAE and 40.3% RMSE when recovering missing data in 2 adjacent time
steps. The reason for the superior performance is that SIM interpolates
missing data based on both the inner relationships between water quality
parameters and the accumulated information through time.