Pierre Gentine - ESS Open Archive

Pierre Gentine

Public Documents 15

Learning Atmospheric Boundary Layer Turbulence

Sara Shamekh

and 1 more

June 23, 2023

Accurately representing vertical turbulent fluxes in the planetary boundary layer is vital for moisture and energy transport. Nonetheless, the parameterization of the boundary layer remains a major source of inaccuracy in climate models. Recently, machine learning techniques have gained popularity for representing oceanic and atmospheric processes, yet their high dimensionality limits interpretability. This study introduces a new neural network architecture employing non-linear dimensionality reduction to predict vertical turbulent fluxes in a dry convective boundary layer. Our method utilizes turbulent kinetic energy and scalar profiles as input to extract a physically constrained two-dimensional latent space, providing the necessary yet minimal information for accurate flux prediction. We obtained data by coarse-graining Large Eddy Simulations covering a broad spectrum of boundary layer conditions, from weakly to strongly unstable. These regimes are employed to constrain the latent space disentanglement, enhancing interpretability. By applying this constraint, we decompose the vertical turbulent flux of various scalars into two main modes of variability: wind shear and convective transport. Our data-driven parameterization accurately predicts vertical turbulent fluxes (heat and passive scalars) across turbulent regimes, surpassing state-of-the-art schemes like the eddy-diffusivity mass flux scheme. By projecting each variability mode onto its associated scalar gradient, we estimate the diffusive flux and learn the eddy diffusivity. The diffusive flux is found to be significant only in the surface layer for both modes and becomes negligible in the mixed layer. The retrieved eddy diffusivity is considerably smaller than previous estimates used in conventional parameterizations, highlighting the predominant non-diffusive nature of transport.

Interpretable Machine Learning-based Radiation Emulation for ICON

Katharina Hafner

and 6 more

November 15, 2024

The radiation parameterization is one of the computationally most expensive components of Earth system models (ESMs). To reduce computational cost, radiation is often calculated on coarser spatial or temporal scales, or both, than other physical processes in ESMs, leading to uncertainties in cloud-radiation interactions and thereby in radiative temperature tendencies. One way around this issue is the emulation of the radiation parameterization using machine learning which is usually faster and has good accuracy in a high dimensional parameter space. This study investigates the development and interpretation of a machine learning based radiation emulator using the ICOsahedral Non-hydrostatic (ICON) model with the RTE-RRTMGP radiation code which calculates radiative fluxes based on the atmospheric state and its optical properties. With a Bidirectional Long Short-Term Memory (Bi-LSTM) architecture, which can account for vertical bidirectional auto-correlation, we can accurately emulate shortwave and longwave heating rates with a mean absolute error of $0.049~K/d\,(2.50\%)$ and $0.069~K/d\,(5.14\%)$ respectively. Further, we analyse the trained neural networks using Shapley Additive exPlanations (SHAP) and confirm that the networks have learned physical meaningful relationships among the inputs and outputs. Notably, we observe that the local temperature is used as a predictive source for the longwave heating, consistent with physical models of radiation. For shortwave heating, we find that clouds reflect radiation, leading to reduced heating below the cloud.

Parameter Estimation in Land Surface Models: Challenges and Opportunities with Data A...

Nina Raoult

and 28 more

October 08, 2024

Accurately predicting terrestrial ecosystem responses to climate change is crucial for addressing global challenges. This relies on mechanistic modelling of ecosystem processes through Land Surface Models (LSMs). Despite their importance, LSMs face significant uncertainties due to poorly constrained parameters, especially in carbon cycle predictions. This paper reviews the progress made in using data assimilation (DA) for LSM parameter optimisation, focusing on carbon-water-vegetation interactions, as well as discussing the technical challenges faced by the community. These challenges include identifying sensitive model parameters and their prior distributions, characterising errors due to observation biases and model-data inconsistencies, developing observation operators to interface between the model and the observations, tackling spatial and temporal heterogeneity as well as dealing with large and multiple datasets, and including the spin-up and historical period in the assimilation window. We then outline how machine learning (ML) can help address these issues, proposing different avenues for future work that integrate ML and DA to reduce uncertainties in LSMs. We conclude by highlighting future priorities, including the need for international collaborations, to fully leverage the wealth of available Earth observation datasets, harness machine learning advances, and enhance the predictive capabilities of LSMs.

Sampling Hybrid Climate Simulation at Scale to Reliably Improve Machine Learning Para...

Jerry Lin

and 8 more

July 11, 2024

Machine-learning (ML) parameterizations of subgrid processes (here of turbulence, convection, and radiation) may one day replace conventional parameterizations by emulating high-resolution physics without the cost of explicit simulation. However, their development has been stymied by uncertainty surrounding whether or not improved offline performance translates to improved online performance (i.e., when coupled to a large-scale general circulation model (GCM)). A key barrier has been the limited sampling of the online effects of the ML design decisions and tuning due to the complexity of performing large ensembles of hybrid physics-ML climate simulations. Our work examines the coupled behavior of full-physics ML parameterizations using large ensembles of hybrid simulations, totalling 2,970 in our case. With extensive sampling, we statistically confirm that lowering offline error lowers online error (given certain constraints). However, we also reveal that decisions decreasing online error, like removing dropout, can trade off against hybrid model stability and vice versa. Nevertheless, we are able to identify design decisions that yield unambiguous improvements to offline and online performance, namely incorporating memory and training on multiple climates. We also find that converting moisture input from specific to relative humidity enhances online stability and that using a Mean Absolute Error (MAE) loss breaks the aforementioned offline/online error relationship. By enabling rapid online experimentation at scale, we empirically answer previously unresolved questions regarding subgrid ML parameterization design.

Data-Driven Equation Discovery of a Cloud Cover Parameterization

Arthur Grundner

and 3 more

April 18, 2023

A promising method for improving the representation of clouds in climate models, and hence climate projections, is to develop machine learning-based parameterizations using output from global storm-resolving models. While neural networks can achieve state-of-the-art performance, they are typically climate model-specific, require post-hoc tools for interpretation, and struggle to predict outside of their training distribution. To avoid these limitations, we combine symbolic regression, sequential feature selection, and physical constraints in a hierarchical modeling framework. This framework allows us to discover new equations diagnosing cloud cover from coarse-grained variables of global storm-resolving model simulations. These analytical equations are interpretable by construction and easily transferable to other grids or climate models. Our best equation balances performance and complexity, achieving a performance comparable to that of neural networks ($R^2=0.94$) while remaining simple (with only 13 trainable parameters). It reproduces cloud cover distributions more accurately than the Xu-Randall scheme across all cloud regimes (Hellinger distances $<0.09$), and matches neural networks in condensate-rich regimes. When applied and fine-tuned to the ERA5 reanalysis, the equation exhibits superior transferability to new data compared to all other optimal cloud cover schemes. Our findings demonstrate the effectiveness of symbolic regression in discovering interpretable, physically-consistent, and nonlinear equations to parameterize cloud cover.

Modeling wildfire activity in the western United States with machine learning

Jatan Buch

and 4 more

December 11, 2022

The annual area burned due to wildfires in the western United States (WUS) increased by more than 300% between 1984 and 2020. However, accounting for the nonlinear, spatially heterogeneous interactions between climate, vegetation, and human predictors driving the trends in fire frequency and sizes at different spatial scales remains a challenging problem for statistical fire models. Here we introduce a novel stochastic machine learning (ML) framework to model observed fire frequencies and sizes in 12 km x 12 km grid cells across the WUS. This framework is implemented using Mixture Density Networks trained on a wide suite of input predictors. The modeled WUS fire frequency corresponds well with observations at both monthly (r= 0.94) and annual (r= 0.85) timescales, as do the monthly (r= 0.90) and annual (r= 0.88) area burned. Moreover, the annual time series of both fire variables exhibit strong correlations (r >= 0.6) in 16 out of 18 ecoregions. Our ML model captures the interannual variability and the distinct multidecade increases in annual area burned for both forested and non-forested ecoregions. Evaluating predictor importance with Shapley additive explanations, we find that fire month vapor pressure deficit (VPD) is the dominant driver of fire frequencies and sizes across the WUS, followed by 1000-hour dead fuel moisture (FM1000), total monthly precipitation (Prec), mean daily maximum temperature (Tmax), and fraction of grassland cover in a grid cell. Our findings serve as a promising use case of ML techniques for wildfire prediction in particular and extreme event modeling more broadly.

Understanding and Improving Greenland Ice Albedo in Climate Models

Raf Antwerpen

and 5 more

April 15, 2024

A document by Raf Antwerpen. Click on the document to view its contents.

Constraining respiration flux and carbon pools in a simple ecosystem carbon model

Olya Skulovich

and 3 more

February 04, 2024

Incorporating observational data in carbon-cycle models provides a systematic framework for understanding complex ecosystem carbon dynamics, contributing essential insights for climate change mitigation and land ability to continue acting as a carbon sink. This study addresses the challenge of accurately quantifying carbon fluxes and pools, focusing on the information content of remote sensing observations. The research explores the impact of assimilating multiple observational datasets into the CARbon DAta MOdel fraMework (CARDAMOM). Satellite observations such as solar-induced fluorescence (SIF) and vegetation optical depth (VOD) are used as proxies for photosynthesis and aboveground biomass, respectively. The study aims to answer key questions about the reliability of remote sensing data in constraining the ecosystem respiration flux and sizes and dynamics of carbon pools and the relative usefulness of SIF and VOD across five FLUXNET sites. We conclude that assimilating remote SIF and VOD instead of site-based net ecosystem exchange did not deteriorate and even improved model predictions for all metrics except for interannual variability. Notably, the improved results correspond to a consistent shift in values for crucial model parameters across all five investigated sites.

Hybrid Modeling of Evapotranspiration: Inferring Stomatal and Aerodynamic Resistances...

Reda ElGhawi

and 6 more

September 27, 2022

The process of evapotranspiration transfers water vapour from vegetation and soil surfaces to the atmosphere, the so-called latent heat flux (𝑄 LE), and thus crucially modulates Earth’s energy, water, and carbon cycles. Vegetation controls 𝑄 LE through regulating the leaf stomata (i.e., surface resistance 𝑟 s) and through altering surface roughness (aerodynamic resistance 𝑟 a). Estimating 𝑟 s and 𝑟 a across different vegetation types proves to be a key challenge in predicting 𝑄 LE. Here, we propose a hybrid modeling approach (i.e., combining mechanistic modeling and machine learning) for 𝑄 LE where neural networks independently learn the resistances from observations as intermediate variables. In our hybrid modeling setup, we make use of the Penman-Monteith equation based on the Big Leaf theory in conjunction with multi-year flux measurements across different forest and grassland sites from the FLUXNET database. We follow two conceptually different strategies to constrain the hybrid model to control for equifinality arising when estimating the two resistances simultaneously. One strategy is to impose an a priori constraint on 𝑟 a based on our mechanistic understanding (theory-driven strategy), while the other strategy makes use of more observational data and adds a constraint in predicting 𝑟 a through multi-task learning of the latent as well as the sensible heat flux (𝑄 H ; data-driven strategy). Our results show that all hybrid models exhibit a fairly high predictive skill for the target variables with 𝑅 2 = 0.82-0.89 for grasslands and 𝑅 2 = 0.70-0.80 for forests sites at the mean diurnal scale. The predictions of 𝑟 s and 𝑟 a show physical consistency across the two regularized hybrid models, but are physically implausible in the under-constrained hybrid model. The hybrid models are robust in reproducing consistent results for energy fluxes and resistances across different scales (diurnal, seasonal, interannual), reflecting their ability to learn the physical dependence of the target variables on the meteorological inputs. As a next step, we propose to test these heavily observation-informed parameterizations derived through hybrid modeling as a substitute for overly simple ad hoc formulations in Earth system models.

Weekly to annual variability of surface soil moisture

Xuan Xi

and 1 more

July 11, 2020

Soil moisture is important for sub-seasonal and seasonal climate prediction. However, biases and uncertainties of soil moisture in climate models affect the accuracy of climate prediction. Here we evaluate biases in climate model soil moisture across different time scales in the frequency domain. Based on our findings, compared to observations, soil moisture variability in the models is found to be underestimated at frequencies smaller than the seasonal time scale and overestimated at frequencies larger than the seasonal time scale. In addition, for the total effect of evapotranspiration and precipitation variability on soil moisture, models also underestimate frequencies smaller than the seasonal time scale and overestimate frequencies larger than it. Furthermore, no matter which factor (evapotranspiration or precipitation) is most affecting soil moisture, models underestimate its effect on soil moisture in the corresponding frequency range. Finally, at a global scale, biases in climate models can be related to the mean climate and not to soil properties. This study provides new insights into climate models deficiencies, and contributes to a better understanding of soil moisture and climate.

Implicit learning of convective organization explains precipitation stochasticity

Sara Shamekh

and 3 more

September 29, 2022

Accurate prediction of precipitation intensity is of crucial importance for both human and natural systems, especially in a warming climate more prone to extreme precipitation. Yet climate models fail to accurately predict precipitation intensity, particularly extremes. One missing piece of information in traditional climate model parameterizations is sub-grid scale cloud structure and organization, which affects precipitation intensity and stochasticity at the grid scale. Here we show, using storm-resolving climate simulations and machine learning, that by implicitly learning sub-grid organization, we can accurately predict precipitation variability and stochasticity with a low dimensional set of variables. Using a neural network to parameterize coarse-grained precipitation, we find mean precipitation is predictable from large scale quantities only; however, the neural network cannot predict the variability of precipitation (R 2 ∼ 0.4) and underestimates precipitation extremes. Performance is significantly improved when the network is informed by our novel organization metric, correctly predicting precipitation extremes and spatial variability (R 2 ∼ 0.95). The organization metric is implicitly learned by training the algorithm on high-resolution precipitable water, encoding organization degree and humidity amount at the subgrid-scale. The organization metric shows large hysteresis, emphasizing the role of memory created by sub-grid scale structures. We demonstrate this organization metric can be predicted as a simple memory process from information available at the previous time steps. These findings stress the role of organization and memory in accurate prediction of precipitation intensity and extremes and the necessity of parameterizing sub-grid scale convective organization in climate models to better project future changes in the water cycle and extremes.

Evaluating the effects of precipitation and evapotranspiration on soil moisture varia...

Xuan Xi

and 3 more

April 28, 2022

The effects of precipitation (Pr) and evapotranspiration (ET) on soil moisture play an essential role in the land-atmosphere system. Here we evaluate multimodel differences of these effects within the Coupled Model Intercomparison Project Phase 5 (CMIP5) compared to Soil Moisture Active Passive (SMAP) products in the frequency domain. The variability of surface soil moisture (SSM), Pr, and ET within three frequency bands (7 ~ 30 days, 30 ~ 90 days, and 90 ~ 365 days) after normalization is quantified using Fourier transform. We then analyze the impact of ET and Pr on SSM variability based on a transfer function assuming these variables with a linear time-invariant (LTI) system. For the simulated effects of ET and Pr on SSM variability, models underestimate them in the two higher frequency bands and overestimate them in the lowest frequency band but show better estimates in transitional zones between dry and wet climates. Besides, the effects on SSM by Pr and ET are found to be different across the three frequency bands, and models underestimate the one of Pr and ET as the dominant factor controlling SSM variability in each frequency band. This study identifies the spatiotemporal distribution of the CMIP5 model deficiencies in simulating ET and Pr effects on SSM. Overcoming these deficiencies could improve the interpretability and predictability of Earth system models in simulating interactions among the three variables.