Data set description and
preparation
The data provided as part of ClimateBench is a heavily curated version
of that publicly available in the CMIP6 data archive. Here we describe
the data extraction and processing steps, but the scripts used to
perform this are also freely available (as described below).
We use a selection of complementary simulations in order to provide as
large a training dataset as possible while attempting to avoid
unnecessary redundancy. Table 1 details the full list of simulations
included, the period they cover and a brief description of their purpose
in this context. Given that the primary purpose of ClimateBench is to
train emulators over different emission scenarios, ScenarioMIP
simulations are a key component of the dataset. ScenarioMIP prescribes a
limited set of possible future emissions pathways exploring different
socio-economic scenarios representing plausible narratives. These
scenarios are designed to span a range of mitigation scenarios (denoted
by the first number in each scenario) and end-of-century forcing
possibilities (denoted by the last two numbers in each scenario). We
include all available simulations, including the AerChemMIPssp370-lowNTCF variation of ssp370 which includes lower
emissions of near-term climate forcers (NTCFs) such as aerosol (but not
methane). We choose ssp245 as our test dataset against which all
ClimateBench emulators are to be evaluated. This scenario represents a
medium mitigation and medium forcing scenario, ensuring trained
emulators are able to interpolate a solution rather than extrapolate (as
discussed further in Section 5). The CMIP6 historical experiment
is also included since it provides useful training data at low emissions
values.
Table 1: Details of
post-processed simulations provided as part of the ClimateBench dataset