Jessica A Eisma

and 3 more

High quality citizen science data can be instrumental in advancing science toward new discoveries and a deeper understanding of under-observed phenomena. However, the error structure of citizen scientist (CS) data must be well-defined. Within a citizen science program, the errors in submitted observations vary, and their occurrence may depend on CS-specific characteristics. This study develops a graphical Bayesian inference model of error types in CS data. The model assumes that: (1) each CS observation is subject to a 5 specific error type, each with its own bias and noise; and (2) an observation’s error type depends on the error community of the CS, which in turn relates to characteristics of the CS submitting the observation. Given a set of CS observations and corresponding ground-truth values, the model can be calibrated for a specific application, yielding (i) number of error types and error communities, (ii) bias and noise for each error type, (iii) error distribution of each error community, and (iv) the error community to which each CS belongs. The model, applied to Nepal CS rainfall observations, 10 identifies five error types and sorts CSs into four model-inferred communities. In the case study, 73% of CSs submitted data with errors in fewer than 5% of their observations. The remaining CSs submitted data with unit, meniscus, unknown, and outlier errors. A CS’s assigned community, coupled with model-inferred error probabilities, can identify observations that require verification. With such a system, the onus of validating CS data is partially transferred from human effort to machine-learned algorithms.

Gerrit Schoups

and 1 more

To fully benefit from remotely sensed observations of the terrestrial water cycle, bias and random errors in these datasets need to be quantified. This paper presents a Bayesian hierarchical model that fuses monthly water balance data and estimates the corresponding data errors and error-corrected water balance components (precipitation, evaporation, river discharge, and water storage). The model combines monthly basin-scale water balance constraints with probabilistic data error models for each water balance variable. Each data error model includes parameters that are in turn treated as unknown random variables to reflect uncertainty in the errors. Errors in precipitation and evaporation data are parameterized as a function of multiple data sources, while errors in GRACE storage observations are described by a noisy sine wave model with parameters controlling phase, amplitude and randomness of the sine wave. Error parameters and water balance variables are estimated using a combination of Markov Chain Monte Carlo sampling and iterative smoothing. Application to semi-arid river basins in Iran yields (i) significant reductions in evaporation uncertainty during water-stressed summers, (ii) basin-specific timing and amplitude corrections of the GRACE water storage dynamics, and (iii) posterior water balance estimates with average standard errors of 4-12 mm/month for water storage, 3.5-7 mm/month for precipitation, 2-6 mm/month for evaporation, and 0-2 mm/month for river discharge. The approach is readily extended to other datasets and other (gauged) basins around the world, possibly using customized data error models. The resulting error-filtered and bias-corrected water balance estimates can be used to evaluate hydrological models.