Conclusions
The application of machine learning to the prediction of future climate
states has, perhaps justifiably due to the challenges laid out above,
been cautious to date. Particular applications however, with carefully
chosen training data and objectives, can provide fruitful avenues for
research and open exciting opportunities for improvement over the
current state-of-the-art. This paper introduces the ClimateBench dataset
in order to galvanise existing research in this area, provide a standard
objective with which to compare approaches and also introduce new
researchers to the challenge of climate emulation. It provides a diverse
set of training data with clear objectives and challenging target
variables, some of which have been extensively studied (surface air
temperature) and some which have been somewhat neglected (diurnal
temperature range and precipitation).
Current impact assessments are often based on simple emulators, which
are then scaled to match modelled patterns, but which are unable to
predict non-linear responses in e.g. precipitation. A robust,
trustworthy emulator which is able to provide such predictions could be
immensely valuable in quantifying and understanding the changes and
associated risks of different socio-economic pathways. Given the
importance of faithfully and accurately reproducing the response of
ESMs, we hope the challenge will also spur innovation in nascent
physically informed ML techniques.
In order to meet these objectives, we have provided open, easy to access
datasets and training notebooks which reproduce the results shown in
this manuscript and demonstrate the use of the different baseline
emulators. All software is open-source and readily available using
commonly used package managers. We hope this dataset will provide a
focus for climate and ML researchers to advance the field of climate
model emulation and provide policy makers with the tools they require to
make well informed decisions.