The Promise of Diversity: Distribution-based Hydrologic Model Evaluation
and Diagnostics
Abstract
\(\)\(\)This paper advocates the use of simulation distributions for hydrologic
model evaluation and model diagnostics. Distribution evaluation is
supported by information-theoretic arguments and puts into modeling
practice the social justice narrative of diversity, equity and inclusion
for different simulations. We discuss past developments that led to the
current state-of-the-art of forecast verification in hydrology and bring
to the fore scoring rules for model evaluation and diagnostics. Strictly
proper scoring rules condense a distribution forecast to a single reward
value for the materialized outcome(s) and have a strong underpinning in
statistical, decision and information theory. We review scoring rules
for dichotomous and categorical events, quantiles (intervals) and
density forecasts, discuss the importance of scoring rule propriety and
address diagnostic aspects such as sharpness, reliability and entropy.
The usefulness and power of scoring rules is demonstrated on simple
benchmark problems and discharge distributions simulated with conceptual
watershed models using GLUE and Bayesian model averaging. We also link
scoring rules to model diagnostics and present strictly proper
divergence scores for flood frequency analysis and flow duration and
recession curves. Scoring rules offer a rigorous information-theoretic
underpinning to model evaluation and diagnostics and provide
statistically principled means for (Bayesian) model selection and the
analysis of hydrograph functionals, flood frequencies and extreme
events.