Benchmarking of machine learning ocean subgrid parameterizations in an
idealized model
Abstract
Recently, a growing number of studies have used machine learning (ML)
models to parameterize computationally intensive subgrid-scale processes
in ocean models. Such studies typically train ML models with filtered
and coarse-grained high-resolution data and evaluate their predictive
performance offline, before implementing them in a coarse resolution
model and assessing their online performance. In this work, we
systematically benchmark the online performance of such models, their
generalization to domains not encountered during training, and their
sensitivity to dataset design choices. We apply this proposed framework
to compare a large number of physical and neural network (NN)-based
parameterizations. We find that the choice of filtering and
coarse-graining operator is particularly critical and this choice should
be guided by the application. We also show that all of our
physics-constrained NNs are stable and perform well when implemented
online, but generalize poorly to new regimes. To improve generalization
and also interpretability, we propose a novel equation-discovery
approach combining linear regression and genetic programming with
spatial derivatives. We find this approach performs on par with neural
networks on the training domain but generalizes better beyond it. We
release code and data to reproduce our results and provide the research
community with easy-to-use resources to develop and evaluate additional
parameterizations.