Ryan Lagerquist

and 3 more

Machine-learned uncertainty quantification (ML-UQ) has become a hot topic in environmental science, especially for neural networks.  Scientists foresee the use of ML-UQ to make better decisions and assess the trustworthiness of the ML model.  However, because ML-UQ is a new tool, its limitations are not yet fully appreciated.  For example, some types of uncertainty are fundamentally unresolvable, including uncertainty that arises from data being out of sample, i.e., outside the distribution of the training data.  While it is generally recognized that ML-based point predictions (predictions without UQ) do not extrapolate well out of sample, this awareness does not exist for ML-based uncertainty.  When point predictions have a large error, instead of accounting for this error by producing a wider confidence interval, ML-UQ often fails just as spectacularly.  We demonstrate this problem by training ML with five different UQ methods to predict shortwave radiative transfer.  The ML-UQ models are trained with real data but then tasked with generalizing to perturbed data containing, e.g., fictitious cloud and ozone layers.  We show that ML-UQ completely fails on the perturbed data, which are far outside the training distribution.  We also show that when the training data are lightly perturbed -- so that each basis vector of perturbation has a little variation in the training data -- ML-UQ can extrapolate along the basis vectors with some success, leading to much better (but still somewhat concerning) performance on the validation and testing data.  Overall, we wish to discourage overreliance on ML-UQ, especially in operational environments.

Katherine Haynes

and 4 more

Neural networks (NN) have become an important tool for prediction tasks -- both regression and classification -- in environmental science.  Since many environmental-science problems involve life-or-death decisions and policy-making, it is crucial to provide not only predictions but also an estimate of the uncertainty in the predictions.  Until recently, very few tools were available to provide uncertainty quantification (UQ) for NN predictions.  However, in recent years the computer-science field has developed numerous UQ approaches, and several research groups are exploring how to apply these approaches in environmental science.  We provide an accessible introduction to six of these UQ approaches, then focus on tools for the next step, namely to answer the question: Once we obtain an uncertainty estimate (using any approach), how do we know whether it is good or bad?  To answer this question, we highlight four evaluation graphics and eight evaluation scores that are well suited for evaluating and comparing uncertainty estimates (NN-based or otherwise) for environmental-science applications.  We demonstrate the UQ approaches and UQ-evaluation methods for two real-world problems: (1) estimating vertical profiles of atmospheric dewpoint (a regression task) and (2) predicting convection over Taiwan based on Himawari-8 satellite imagery (a classification task).  We also provide Jupyter notebooks with Python code for implementing the UQ approaches and UQ-evaluation methods discussed herein.  This article provides the environmental-science community with the knowledge and tools to start incorporating the large number of emerging UQ methods into their research.