Figure 5 : Permutation importances for the most important component of each variable in predicting global mean temperature (TAS) and precipitation (PR). Each emulator input variable is shuffled in turn to determine the relative contribution to prediction skill. Note that these average estimates do not account for potential regional contributions which may be particularly relevant for aerosol.

Neural Networks

Artificial Neural Networks (ANNs), algorithms inspired by the biological neural networks of human brains, have shown great success in areas like Computer Vision and Natural Language Processing. Two major architectures are Convolutional Neural Networks (CNNs) (LeCun et al., 1990), to model spatial dependencies, and Recurrent Neural Networks (RNNs), to process sequential data. Besides the traditional areas, ANNs have been recently employed to tackle a variety of problems in earth system science (Camp-Valls et al., 2021). Long short-term memory (LSTM) networks (Hochreiter et al., 1997), an advanced type of RNNs, are used for modelling time-series, for example for El NiƱo-Southern Oscillation prediction (Broni-Bedaiko et al., 2019). In cases where both input and target have a spatial structure, such as modelling of precipitation or changes in satellite imagery, a very commonly used CNN type is the U-Net (Ronneberger et al, 2015), which has been applied frequently in climate science and weather forecasting (Trebing et al., 2020, Harder et al., 2020).
We explored both a pure LSTM approach and a pure CNN approach, using a U-Net. A combination of both network types gave the best results, therefore we use an LSTM combined with a CNN for our example architecture. The CNN is used to extract spatial features before feeding our input in the LSTM. The CNN consists of one convolutional layer with a kernel size of 6, followed by a ReLU activation function and average pooling. The LSTM uses 25 units and a ReLU activation function as well, which is followed by a dense layer and reshaping to the output dimension. To train the emulator we use ssp126 , 370 and585 scenarios and the historical data with a moving-time window size of 10 years (in one-year increments, leading to 570 training points). The emulator is trained for 20 epochs, using a batch size of 25 for T and DTR and 5 for PR and PR90. For this baseline approach we chose not to do any hyperparameter optimization.
RMSE scores obtained with the CNN-LSTM architecture are comparable to those obtained with the other methods. The CNN-LSTM architecture performs particularly well for temperature predictions, with average RMSE scores of 0.38 K over the second half of the 21st century. This might be because temperature has greater autocorrelation/less variability from one year to the next one compared to the other variables. Such autocorrelation would be well captured by a time-aware model like an RNN. Spatial patterns of temperature changes, such as the Arctic amplification, are reasonably well predicted, even though the coldest temperatures (e.g. in the North-Atlantic cold patch) are not as well captured (as shown in Figure 4). The CNN-LSTM performs slightly worse than the other emulators for diurnal temperature range and precipitation predictions. For precipitation, global patterns (e.g. the ITCZ shift) are well predicted by the emulator, but the relative changes are overestimated (too wet or too dry) in most places. Like for all other emulators showcased in this study, extreme precipitation proves the hardest variable to predict accurately.