Oliver Dunbar - ESS Open Archive

Oliver Dunbar

Public Documents 7

Online Learning of Entrainment Closures in a Hybrid Machine Learning Parameterization

Costa Christopoulos

and 6 more

June 10, 2024

This work integrates machine learning into an atmospheric parameterization to target uncertain mixing processes while maintaining interpretable, predictive, and well-established physical equations. We adopt an eddy-diffusivity mass-flux (EDMF) parameterization for the unified modeling of various convective and turbulent regimes. To avoid drift and instability that plague offline-trained machine learning parameterizations that are subsequently coupled with climate models, we frame learning as an inverse problem: Data-driven models are embedded within the EDMF parameterization and trained online using output from large-eddy simulations (LES) forced with GCM-simulated large-scale conditions in the Pacific. Rather than optimizing subgrid-scale tendencies, our framework directly targets climate variables of interest, such as the vertical profiles of entropy and liquid water path. Specifically, we use ensemble Kalman inversion to simultaneously calibrate both the EDMF parameters and the parameters governing data-driven lateral mixing rates. The calibrated parameterization outperforms existing EDMF schemes, particularly in tropical and subtropical locations of the present climate, and maintains high fidelity in simulating shallow cumulus and stratocumulus regimes under increased sea surface temperatures from AMIP4K experiments. The results showcase the advantage of physically-constraining data-driven models and directly targeting relevant variables through online learning to build robust and stable machine learning parameterizations.

High-Dimensional Covariance Estimation From a Small Number of Samples

Matthias Morzfeld

and 5 more

May 06, 2024

We synthesize knowledge from numerical weather prediction, inverse theory and statistics to address the problem of estimating a high-dimensional covariance matrix from a small number of samples. This problem is fundamental in statistics, machine learning/artificial intelligence, and in modern Earth science. We create several new adaptive methods for high-dimensional covariance estimation, but one method, which we call NICE (Noise-Informed Covariance Estimation), stands out because it has three important properties: (i) NICE is conceptually simple and computationally efficient; (ii) NICE guarantees symmetric positive semi-definite covariance estimates; and (iii) NICE is largely tuning-free. We illustrate the use of NICE on a large set of Earth-science-inspired numerical examples, including cycling data assimilation, geophysical inversion of field data, and training of feed-forward neural networks with time-averaged data from a chaotic dynamical system. Our theory, heuristics and numerical tests suggest that NICE may indeed be a viable option for high-dimensional covariance estimation in many Earth science problems.

Training physics-based machine-learning parameterizations with gradient-free ensemble...

Ignacio Lopez-Gomez

and 5 more

June 22, 2022

Most machine learning applications in Earth system modeling currently rely on gradient-based supervised learning. This imposes stringent constraints on the nature of the data used for training (typically, residual time tendencies are needed), and it complicates learning about the interactions between machine-learned parameterizations and other components of an Earth system model. Approaching learning about process-based parameterizations as an inverse problem resolves many of these issues, since it allows parameterizations to be trained with partial observations or statistics that directly relate to quantities of interest in long-term climate projections. Here we demonstrate the effectiveness of Kalman inversion methods in treating learning about parameterizations as an inverse problem. We consider two different algorithms: unscented and ensemble Kalman inversion. Both methods involve highly parallelizable forward model evaluations, converge exponentially fast, and do not require gradient computations. In addition, unscented Kalman inversion provides a measure of parameter uncertainty. We illustrate how training parameterizations can be posed as a regularized inverse problem and solved by ensemble Kalman methods through the calibration of an eddy-diffusivity mass-flux scheme for subgrid-scale turbulence and convection, using data generated by large-eddy simulations. We find the algorithms amenable to batching strategies, robust to noise and model failures, and efficient in the calibration of hybrid parameterizations that can include empirical closures and neural networks.

An efficient Bayesian approach to learning droplet collision kernels: Proof of concep...

Melanie Bieli

and 5 more

January 21, 2022

The small-scale microphysical processes governing the formation of precipitation particles cannot be resolved explicitly by cloud resolving and climate models. Instead, they are represented by microphysics schemes that are based on a combination of theoretical knowledge, statistical assumptions, and fitting to data (“tuning”). Historically, tuning was done in an ad-hoc fashion, leading to parameter choices that are not explainable or repeatable. Recent work has treated it as an inverse problem that can be solved by Bayesian inference. The posterior distribution of the parameters given the data—the solution of Bayesian inference—is found through computationally expensive sampling methods, which require over O(10^5) evaluations of the forward model; this is prohibitive for many models. We present a proof-of-concept of Bayesian learning applied to a new bulk microphysics scheme named “Cloudy”, using the recently developed Calibrate-Emulate-Sample (CES) algorithm. Cloudy models collision-coalescence and collisional breakup of cloud droplets with an adjustable number of prognostic moments and with easily modifiable assumptions for the cloud droplet mass distribution and the collision kernel. The CES algorithm uses machine learning tools to accelerate Bayesian inference by reducing the number of forward evaluations needed to O(10^2). It also exhibits a smoothing effect when forward evaluations are polluted by noise. In a suite of perfect-model experiments, we show that CES enables computationally efficient Bayesian inference of parameters in Cloudy from noisy observations of moments of the droplet mass distribution. In an additional imperfect-model experiment, a collision kernel parameter is successfully learned from output generated by a Lagrangian particle-based microphysics model.

Parameter uncertainty quantification in an idealized GCM with a seasonal cycle

Michael F. Howland

and 2 more

July 29, 2021

Climate models are generally calibrated manually by comparing selected climate statistics, such as the global top-of-atmosphere energy balance, to observations. The manual tuning only targets a limited subset of observational data and parameters. Bayesian calibration can estimate climate model parameters and their uncertainty using a larger fraction of the available data and automatically exploring the parameter space more broadly. In Bayesian learning, it is natural to exploit the seasonal cycle, which has large amplitude, compared with anthropogenic climate change, in many climate statistics. In this study, we develop methods for the calibration and uncertainty quantification (UQ) of model parameters exploiting the seasonal cycle, and we demonstrate a proof-of-concept with an idealized general circulation model (GCM). Uncertainty quantification is performed using the calibrate-emulate-sample approach, which combines stochastic optimization and machine learning emulation to speed up Bayesian learning. The methods are demonstrated in a perfect-model setting through the calibration and UQ of a convective parameterization in an idealized GCM with a seasonal cycle. Calibration and UQ based on seasonally averaged climate statistics, compared to annually averaged, reduces the calibration error by up to an order of magnitude and narrows the spread of posterior distributions by factors between two and five, depending on the variables used for UQ. The reduction in the size of the parameter posterior distributions leads to a reduction in the uncertainty of climate model predictions.

Ensemble-Based Experimental Design for Targeted High-Resolution Simulations to Inform...

Oliver Dunbar

and 3 more

January 13, 2022

Targeted high-resolution simulations driven by a general circulation model (GCM) can be used to calibrate GCM parameterizations of processes that are globally unresolvable but can be resolved in limited-area simulations. This raises the question of where to place high-resolution simulations to be maximally informative about the uncertain parameterizations in the global model. Here we construct an ensemble-based parallel algorithm to locate regions that maximize the uncertainty reduction, or information gain, in the uncertainty quantification of GCM parameters with regional data. The algorithm is based on a Bayesian framework that exploits a quantified posterior distribution on GCM parameters as a measure of uncertainty. The algorithm is embedded in the recently developed calibrate-emulate-sample (CES) framework, which performs efficient model calibration and uncertainty quantification with only O(10^2) forward model evaluations, compared with O(10^5) forward model evaluations typically needed for traditional approaches to Bayesian calibration. We demonstrate the algorithm with an idealized GCM, with which we generate surrogates of high-resolution data. In this setting, we calibrate parameters and quantify uncertainties in a quasi-equilibrium convection scheme. We consider (i) localization in space for a statistically stationary problem, and (ii) localization in space and time for a seasonally varying problem. In these proof-of-concept applications, the calculated information gain reflects the reduction in parametric uncertainty obtained from Bayesian inference when harnessing a targeted sample of data. The largest information gain results from regions near the intertropical convergence zone (ITCZ) and indeed the algorithm automatically targets these regions for data collection.

Calibration and Uncertainty Quantification of Convective Parameters in an Idealized G...

Oliver Dunbar

and 3 more

December 28, 2020

Parameters in climate models are usually calibrated manually, exploiting only small subsets of the available data. This precludes an optimal calibration and quantification of uncertainties. Traditional Bayesian calibration methods that allow uncertainty quantification are too expensive for climate models; they are also not robust in the presence of internal climate variability. For example, Markov chain Monte Carlo (MCMC) methods typically require $O(10^5)$ model runs, rendering them infeasible for climate models. Here we demonstrate an approach to model calibration and uncertainty quantification that requires only $O(10^2)$ model runs and can accommodate internal climate variability. The approach consists of three stages: (i) a calibration stage uses variants of ensemble Kalman inversion to calibrate a model by minimizing mismatches between model and data statistics; (ii) an emulation stage emulates the parameter-to-data map with Gaussian processes (GP), using the model runs in the calibration stage for training; (iii) a sampling stage approximates the Bayesian posterior distributions by using the GP emulator and then samples using MCMC. We demonstrate the feasibility and computational efficiency of this calibrate-emulate-sample (CES) approach in a perfect-model setting. Using an idealized general circulation model, we estimate parameters in a simple convection scheme from data surrogates generated with the model. The CES approach generates probability distributions of the parameters that are good approximations of the Bayesian posteriors, at a fraction of the computational cost usually required to obtain them. Sampling from this approximate posterior allows the generation of climate predictions with quantified parametric uncertainties.