Sampling Hybrid Climate Simulation at Scale to Reliably Improve Machine
Learning Parameterization
Abstract
Machine-learning (ML) parameterizations of subgrid processes (here of
turbulence, convection, and radiation) may one day replace conventional
parameterizations by emulating high-resolution physics without the cost
of explicit simulation. However, their development has been stymied by
uncertainty surrounding whether or not improved offline performance
translates to improved online performance (i.e., when coupled to a
large-scale general circulation model (GCM)). A key barrier has been the
limited sampling of the online effects of the ML design decisions and
tuning due to the complexity of performing large ensembles of hybrid
physics-ML climate simulations. Our work examines the coupled behavior
of full-physics ML parameterizations using large ensembles of hybrid
simulations, totalling 2,970 in our case. With extensive sampling, we
statistically confirm that lowering offline error lowers online error
(given certain constraints). However, we also reveal that decisions
decreasing online error, like removing dropout, can trade off against
hybrid model stability and vice versa. Nevertheless, we are able to
identify design decisions that yield unambiguous improvements to offline
and online performance, namely incorporating memory and training on
multiple climates. We also find that converting moisture input from
specific to relative humidity enhances online stability and that using a
Mean Absolute Error (MAE) loss breaks the aforementioned offline/online
error relationship. By enabling rapid online experimentation at scale,
we empirically answer previously unresolved questions regarding subgrid
ML parameterization design.