loading page

Machine-learned uncertainty quantification is not magic: Lessons learned from emulating radiative transfer with ML
  • +1
  • Ryan Lagerquist,
  • Imme Ebert-Uphoff,
  • David D Turner,
  • Jebb Q. Stewart
Ryan Lagerquist
Cooperative Institute for Research in the Atmosphere (CIRA), Colorado State University, National Oceanic and Atmospheric Administration (NOAA) Global Systems Laboratory (GSL)

Corresponding Author:[email protected]

Author Profile
Imme Ebert-Uphoff
Cooperative Institute for Research in the Atmosphere (CIRA), Colorado State University, National Oceanic and Atmospheric Administration (NOAA) Global Systems Laboratory (GSL), Department of Electrical and Computer Engineering, Colorado State University
David D Turner
National Oceanic and Atmospheric Administration (NOAA) Global Systems Laboratory (GSL)
Jebb Q. Stewart
National Oceanic and Atmospheric Administration (NOAA) Global Systems Laboratory (GSL)

Abstract

Machine-learned uncertainty quantification (ML-UQ) has become a hot topic in environmental science, especially for neural networks.  Scientists foresee the use of ML-UQ to make better decisions and assess the trustworthiness of the ML model.  However, because ML-UQ is a new tool, its limitations are not yet fully appreciated.  For example, some types of uncertainty are fundamentally unresolvable, including uncertainty that arises from data being out of sample, i.e., outside the distribution of the training data.  While it is generally recognized that ML-based point predictions (predictions without UQ) do not extrapolate well out of sample, this awareness does not exist for ML-based uncertainty.  When point predictions have a large error, instead of accounting for this error by producing a wider confidence interval, ML-UQ often fails just as spectacularly.  We demonstrate this problem by training ML with five different UQ methods to predict shortwave radiative transfer.  The ML-UQ models are trained with real data but then tasked with generalizing to perturbed data containing, e.g., fictitious cloud and ozone layers.  We show that ML-UQ completely fails on the perturbed data, which are far outside the training distribution.  We also show that when the training data are lightly perturbed -- so that each basis vector of perturbation has a little variation in the training data -- ML-UQ can extrapolate along the basis vectors with some success, leading to much better (but still somewhat concerning) performance on the validation and testing data.  Overall, we wish to discourage overreliance on ML-UQ, especially in operational environments.
11 Nov 2023Submitted to ESS Open Archive
14 Nov 2023Published in ESS Open Archive