Fig. 7. Observed (black) and simulated CMIP5 and CMIP6 SST anomalies (relative to 1901-1950) for the North Atlantic (NA, left column), the Global Tropics (GT, middle column), and the North Atlantic Relative Index (NARI, right column) when forced with ALL (blue, top row), AA (magenta, second row), NAT (brown/red, third row), and GHG (green, bottom row). The CMIP6 MMMs are presented with solid curves while the CMIP5 MMMs are presented with dotted curves. Both are surrounded by shaded areas demarking the bootstrapping confidence interval. Panels (a) and (c) additionally display a 20-year running mean of the sum of simulated NA and NARI over the individual forcing simulations for CMIP6 (burgundy dashed curve) with associated bootstrapping confidence interval (burgundy shaded area). Including NA in the sum makes little difference. For NA and GT under AA and NAT (middle two rows and left two columns), the orange curve displays detrended observations, calculated by subtracting simulated GHG-forced SST (bottom row) from observations in that ocean basin. The yellow shaded area is the confidence interval when bootstrapping the MMM of CMIP6 piC simulations, and represents the magnitude of noise in the CMIP6 MMMs. A horizontal black dashed line marks 0 anomaly, which represents the average SST from 1901-1950. The y labels show the number of institutions that were used for each subset of forcing agents in CMIP6 (N, see Table S2), and the subplot titles display the correlation (r) and sRMSE between the MMM and observations for CMIP6.
Observed NARI (panel c, black) shows strong multi-decadal variability throughout the century. In the ALL simulations (top row, blue), the temporal evolution of NARI (c) matches the observations with some skill (r=0.40, sRMSE = 0.92 for CMIP6), but fails to capture the full magnitude of observed cooling in the 1970s and 80s or, more prominently, any multi-decadal variability prior to 1960. Moreover, its GT and NA components do not match very well either the observed, roughly linear warming trend in GT (b), or the marked multi-decadal variability in NA (a). In both CMIP5 and CMIP6 ALL simulations, the simulations of GT (b, blue) are anomalously colder than observations between 1960 and 2000, when simulated AA cooling (e, magenta) is the strongest and not yet compensated by GHG warming (k, green), leading us to question whether the match of simulated and observed NARI in this period happens due to compensating errors. For NA, the match between observations and the ALL-forced response is better in the later part of the record, but worse in the first half. During the period prior to 1960, according to both CMIP ensembles, GHG warming (j, green) masks AA cooling (d, magenta) to produce a roughly constant temperature in the ALL simulations (a, blue). The simulated cold episode in 1964 is due to the eruption of Agung in 1963 (g, brown and red), and it is only after the mid 1960’s that increased GHG warming overtakes stagnating AA cooling to produce pronounced warming in fairly good accord with observations. Much of the observed variability in NA (a, black) thus does not seem to be a response to external radiative forcing.
The AA forcing had appeared to explain observed low-frequency Sahel precipitation variability in H20, but we now see that it might be the right result for the wrong reason. AA (second row, magenta) produce low-frequency NARI variability (f), but this simulated NARI is a poor match to observations (f, r=0.10, sRMSE = 1.04 for CMIP5; r=0.07, sRMSE=1.09 for CMIP6; a performance statistically worse than noise). The difference between simulations and observations is even more stark in NARI’s constituent ocean basins. We can attempt to compare AA-forced NA and GT to an observed “GHG-residual” (that is, the observation minus the GHG-forced MMM, presented in orange instead of black), which represents our best estimate of the sum of observed oceanic IV and the observed responses to aerosols. This index shows marked, roughly stationary low-frequency variability in NA (d, orange), which contrasts with a more monotonic behavior in the simulated NA index (magenta). In particular, we note that the AA simulations display an especially steep decline in NA SST between ~1940 and 1980, but monotonic cooling throughout the century. Though legislation to curb pollution reduced AA loading in the northern hemisphere after 1970 (Hirasawa et al. 2020), simulated NA doesn’t warm at all before 2010. Overall, the effect of reducing AA emissions in both CMIP ensembles is to halt the cooling of NA, not to cause actual warming. This is consistent with estimates of the hemispheric difference in total absorbed solar radiation in AA simulations in CMIP6, which level off, but do not decrease, after 1970 (Menary et al. 2020).
Could internal SST variability (\(\overrightarrow{o}\)) explain the difference between the simulated response to forcing and observations in these ocean basins? In Figure 8, we present the mean PS of SST for piC simulations from each CMIP6 model (colder than observed models are in blue and warmer than observed models are in red). We compare these PS to the PS for observed SST (solid black), the GHG-residual (dotted-dashed black), and/or the ALL-residual (dotted black), avoiding time series with dramatic trends (see subplot legends). Simulated IV in most of the CMIP6 models used in this study does not match residual or observed low-frequency variability in NA (a), GT (b), or NARI (c). In CMIP5, SSTs are colder and IV at all frequencies is larger than in CMIP6, but no model shows an increase in spectral power at low frequencies for any SST index (not shown). There are, however, three CMIP6 models for which low-frequency IV in NA is not inconsistent with model physics: CNRM-ESM2-1 p1 (pink), IPSL-CM6A-LR p1 (blue), and CNRM-CM6-1 p1 (grey). Certainly, either the simulated SST response to forcing, simulated oceanic internal variability, or both, are not well represented in the CMIP ensembles, and this is the primary reason that coupled CMIP simulations cannot reproduce observed 20th century Sahel rainfall.