Machine-learned uncertainty quantification is not magic: Lessons learned from emulating radiative transfer with ML
Machine-learned uncertainty quantification (ML-UQ) has become a hot topic in environmental science, especially for neural networks. Scientists foresee the use of ML-UQ to make better decisions and assess the trustworthiness of the ML model. However, because ML-UQ is a new tool, its limitations are not yet fully appreciated. For example, some types of uncertainty are fundamentally unresolvable, including uncertainty that arises from data being out of sample, i.e., outside the distribution of the training data. While it is generally recognized that ML-based point predictions (predictions without UQ) do not extrapolate well out of sample, this awareness does not exist for ML-based uncertainty. When point predictions have a large error, instead of accounting for this error by producing a wider confidence interval, ML-UQ often fails just as spectacularly. We demonstrate this problem by training ML with five different UQ methods to predict shortwave radiative transfer. The ML-UQ models are trained with real data but then tasked with generalizing to perturbed data containing, e.g., fictitious cloud and ozone layers. We show that ML-UQ completely fails on the perturbed data, which are far outside the training distribution. We also show that when the training data are lightly perturbed -- so that each basis vector of perturbation has a little variation in the training data -- ML-UQ can extrapolate along the basis vectors with some success, leading to much better (but still somewhat concerning) performance on the validation and testing data. Overall, we wish to discourage overreliance on ML-UQ, especially in operational environments.