loading page

Automatic Quality Control of Crowdsourced Rainfall Data with Multiple Noises: A Machine Learning Approach
  • +2
  • Geng Niu,
  • Pan Yang,
  • Yi Zheng,
  • Ximing Cai,
  • Huapeng Qin
Geng Niu
Peking University
Author Profile
Pan Yang
University of Illinois Urbana-Champaign

Corresponding Author:[email protected]

Author Profile
Yi Zheng
Southern University of Science and Technology
Author Profile
Ximing Cai
University of Illinois at Urbana Champaign
Author Profile
Huapeng Qin
Peking University
Author Profile

Abstract

In geophysics, crowdsourcing is an emerging non-traditional environmental monitoring approach that encourages contributions of data from individual citizens. Because of their reliance on undertrained citizens and imprecise low-cost sensors, crowdsourced data applications suffer from different types of noises that can deteriorate the overall monitoring accuracy. In this study, we propose a machine learning approach for automatic Crowdsourced data Quality Control (CSQC) by detecting and removing noisy data points in spatially and temporally discrete crowdsourced observations. We design a set of features from the original and interpolated rainfall data, and apply them to train and test the CSQC models based on both supervised and non-supervised machine learning algorithms. Performances of the CSQC models under various scenarios assuming no further retraining are also tested (hereafter referred to as transferability). The results based on synthetic but realistic data show that the CSQC model can significantly reduce the overall rainfall estimation error. Under the stationary assumption, CSQC models based on both supervised and unsupervised algorithms can have decent performances in noisy data identification and overall rainfall estimation error reduction; however, if the model is transferred to other cities with different rainfall structure or noise composition (without retraining), the supervised Multi-Layer Perceptrons (MLPs) turns out to be the best performing one.
Nov 2021Published in Water Resources Research volume 57 issue 11. 10.1029/2020WR029121