Early‐to‐Late Winter 20th Century North Atlantic Multidecadal Atmospheric Variability in Observations, CMIP5 and CMIP6

The strong multidecadal variability in North Atlantic (NA) winter atmospheric circulation is poorly understood and appears too weak in climate models. Recent research has shown peak atmospheric multidecadal variability over the NA in late winter, particularly March, linked to Atlantic multidecadal variability (AMV) of the ocean. Here a range of NA atmospheric circulation indices are assessed to provide a comprehensive picture of early‐to‐late winter low‐frequency variability and its representation in the latest generation of climate models (Coupled Model Intercomparison Project Phase 6 [CMIP6]). As found for CMIP5, CMIP6 models exhibit too‐weak multidecadal NA atmospheric variability compared to reanalysis data over the period 1862–2005. Consistent with previous research, the eastern part of the NA westerly jet (U700NA) exhibits peak low‐frequency variability in March. However, for NA‐wide jet speed and the NAO, low‐frequency variability and model‐reanalysis discrepancies are strongest in January and February, associated with too‐weak NA ocean‐atmosphere linkages.

strong late-winter (March) peak in multidecadal variability at the eastern part of the NA jet and its connection to indices of Atlantic Multidecadal Variability (AMV).
The analysis here is broadened to include basin-wide lower-tropospheric westerly jet speed and an assessment of multidecadal timescales characteristic of AMV. The main motivation for this is that jet speed explains a large part of the observed low-frequency component of winter NAO variability on multidecadal timescales (Woollings et al., 2012) and that Bracegirdle et al. (2018) found a clear model-reanalysis discrepancy in wintermean low-frequency power of jet speed on AMV timescales. Therefore, three indices of atmospheric variability are included: speed of the Atlantic lower-tropospheric westerly jet, speed of the eastern part of this jet (the "U700NA" diagnostic of Simpson et al. (2018)) and the NAO. Jet latitude is not included since it exhibits weak power on multidecadal timescales (Bracegirdle et al., 2018). With respect to assessing links to ocean variability, SST variability across the NA basin is considered along with two subregions. The NA subpolar gyre (SPG), which emerges as a dominant region on multidecadal timescales that has been identified as important in correctly capturing NAO-SST coupling (Peings et al., 2016;Woollings et al., 2015) and the subtropical NA, which has been found to influence decadal variability of midlatitude circulation via wave propagation (Gastineau & Frankignoul, 2015;Peings et al., 2016).
The main aims of this study are to (a) assess whether previous climate model-reanalysis discrepancies in NA low-frequency variability are reduced in the latest generation of climate models (CMIP6, Eyring et al., 2016) and the recently released version of the 20th Century Reanalysis (20CRv3, Slivinski et al., 2019) and (b) broaden previous studies to assess low-frequency variability across individual winter months and a range of indices of NA atmospheric circulation.

Reanalysis Data
The recently released 20CRv3 was used here. The only input data used in 20CRv3 are surface pressure observations in combination with a state-of-the-art data assimilation and modeling system with the aim of reconstructing atmospheric conditions in a consistent manner over the period of available observations (Slivinski et al., 2019). Compared to the previous version, 20CRv2, version 3 extends further back in time to the early 19th century and has an increased ensemble size (80 members compared to 56). Due to uncertainty relating to bias corrections of mid-19th century ship observations (Slivinski et al., 2019), and to aid direct comparisons to results presented in Bracegirdle et al. (2018), analysis was restricted to 1861 onwards. The variables used were (all monthly mean): zonal wind on 850 and 700 hPa, atmospheric pressure at mean sea level and skin temperature (which over open ocean is SST). A land mask was used to exclude land areas in calculating Atlantic SST indices.

Climate Model Data
Equivalent variables to those extracted from the 20CRv3 data set were assessed in output from historical simulations of CMIP5 and CMIP6 models. The historical simulations include known major anthropogenic and natural climate forcings (such as greenhouse gas concentration increases and major volcanic eruptions) over the period since the mid nineteenth century (Eyring et al., 2016). They are free-running and therefore do not in general follow the observed phases of natural internal variability. For most CMIP models, the historical simulations were repeated a number of times with slightly different initial conditions. These "realizations" provide a measure of the role of internal climate variability in simulated and observed trends over the historical period. The CMIP variables used here are "ua" (zonal wind on pressure levels), "psl" (atmospheric pressure at mean sea level), "ts" (skin temperature of the lower boundary of the atmosphere), and "sftlf" (land area fraction, used to mask out land areas). All available realizations for which output was found with the necessary variables required for this study are listed in Tables S3 and S4 in Supporting Information S1. This comprises 312 realizations from 81 models across both data sets. Four fewer CMIP5 models were available with required data for the SST diagnostics, therefore results using these diagnostics are based on a subset of 77 CMIP models (see Tables S1 and S2 in Supporting Information S1).

Atmospheric Circulation Indices
An EOF-based definition of the NAO is used in this study. It is calculated from mean sea level pressure (MSLP) over the NA, with latitude and longitude ranges following Hurrell (1995) (20-80°N, 90°W-40°E). The principal component time series of the leading EOF of year-to-year anomalies in winter-mean (DJF) or monthly mean MSLP is used to define the time series of NAO variability.
Speed of the NA tropospheric westerly jet follows the definition of Bracegirdle et al. (2018), referred to hereinafter as the jet speed index (JSI). Key aspects of this definition are as follows: (a) In order to maximize the number of CMIP models assessed, it is based on monthly means rather than the more standard daily mean fields. (b) It draws from the definition of Woollings et al. (2015) with zonal averaging between 60°W and 0°W at the 850 hPa level (region shown in Figure 1a). (c) For seasonal winter means, jet diagnostics are first calculated for each month before averaging over the standard winter months December-February.
The westerly wind index used by Simpson et al. (2018), U700NA, is also included here. It is defined as the areaweighted spatial mean westerly wind component at 700 hPa in box 50°-60°N and 40°-10°W (region shown in Figure 1a).

Sea-Surface Temperature Diagnostics
Three indices of SSTs in the NA are used, the locations of which are indicated in Figures 1b-1d. For linking SST and atmospheric variability, Peings et al. (2016) suggest two regions of the NA that are important for climate models to reproduce the observed strength of NAO-SST coupling, the SPG and subtropical NA. They are: (1) A basin-wide index of NA SSTs (SST AMV ), which is an area-weighted spatial mean (with land areas masked out) over the region traditionally used to define AMV indices (0°-60°N and 80°W-0°W) (Qasmi et al., 2017).
(2) An index of NA subpolar gyre SSTs (SST SPG ), which is an area-weighted spatial mean over the region 45°-65°N, 60°-20°W. This definition is chosen based on the region of the gyre as identified by Biri and Klein (2019). (3) An index of NA subtropical SSTs (SST ST ). This is a spatial mean over the region 5°-20°N, 60°-20°W and broadly captures the southern section of the characteristic AMV horseshoe pattern (Gastineau & Frankignoul, 2015), which is evident in Figure 1b.
To generate time series of these indices, each calendar month is treated separately (e.g., time series of all the Januarys over the range of years) after which a low-pass filter is applied to retain multidecadal timescales (>10 years). Each time series is linearly detrended before application of the filter.

Results
In the first part of this section, relevant winter-mean diagnostics from Bracegirdle et al. (2018) are repeated for the updated climate models (CMIP6 in place of CMIP5) and reanalysis (20CRv3 in place of 20CRv2c). This shows whether model-reanalysis discrepancies in low-frequency NA variability have changed significantly in updated versions of climate models and reanalyses. This is followed by a month-by-month early-to-late winter assessment of reanalysis-derived multidecadal variability and its representation in CMIP climate models. To maximize the number of realizations available, this further analysis combines historical simulations from both CMIP5 and CMIP6 (referred to hereinafter as CMIP). Figure 2 shows proportional power spectral density (PSD) periodograms from time series of winter-mean (DJF) atmospheric circulation indices (to aid comparison between periodograms of different diagnostics, the displayed "proportional PSD" values are proportional to the mean of the PSD values across the frequency bins in each periodogram). This shows that the new CMIP data set (CMIP6) produces very similar results to those based on the preceding generation (CMIP5). The main point to note is that, as found for CMIP5, the CMIP6 historical realizations do not reproduce the statistically significant low-frequency peak in JSI that is evident in current and previous versions of 20CR (Bracegirdle et al., 2018;Woollings et al., 2014). At this ∼70 year period there is a clear separation between the 2.5th and 97.5th percentile ranges of the CMIP climate model realizations and the 80 members of the 20CRv3 ensemble (Figures 2e and 2f). Indeed, it is notable that the reanalysis ensemble spread is particularly narrow for this timescale, demonstrating high confidence as was seen for the earlier 20CRv2c version used in Bracegirdle et al. (2018). For NAO and U700NA, this is less clear for these winter-mean diagnostics.
A spectral analysis of the three AMV-related SST diagnostics shows that the observed significant multidecadal ocean variability is stronger than most CMIP ensemble realizations (consistent with power spectra assessments of CMIP5 shown in Zhang and Wang (2013) and Cheung et al. (2017)), but with no clear separation between the 20CRv3 and CMIP ensembles ( Figure S1 in Supporting Information S1). This illustrates that although part of the 20CRv3-CMIP discrepancy in low-frequency JSI variability may be due to weak ocean variability, other factors are likely to play a role (discussed in Section 5).  Figure 3a shows that low-frequency power in 20CRv3 is strong in all winter months (DJFM) for JSI. For the NAO, low-frequency variability is also strongest in mid-to-late winter, but is weak in December. This largely explains the weaker low-frequency power in the NAO for the DJF means shown in Figure 2. For U700NA, the low-frequency power is strongest in late winter, peaking in March. In canonical winter months (December-February), the low-frequency power is stronger for JSI than for NAO or U700NA, but in March U700NA is the strongest.
The months and diagnostics with the strongest low-frequency power in 20CRv3 are also, perhaps unsurprisingly, the months for which CMIP models struggle most to reproduce the power seen in 20CRv3 (Figure 3b). In January, both the NAO and JSI reanalysis-estimated low-frequency power exceeds the 98th percentile of the distribution of the 312 CMIP realizations. For JSI it exceeds the 99th percentile. For both indices, the CMIP-relative power remains above the 90th percentile through March before dropping dramatically in April. For U700NA the 20CRv3 low-frequency power reaches the 96.5th percentile in March. The overall picture is then that the CMIP models do not in general reproduce the strength of reanalysis-derived low-frequency atmospheric variability in January, February and March. The onset of strong variability relative to the CMIP ensemble range occurs earlier for JSI (December) and later for NAO (January) and U700NA (February).
To assess potential links between climate model-reanalysis discrepancies in low-frequency variability and model representation of AMV-atmosphere linkages, Figure 4 shows reanalysis and CMIP-derived correlations between time series of NA SST and atmospheric circulation indices, where the time series have been low-pass filtered at decadal (>10 years) timescales. They are shown for each month between December and March. Negative and statistically significant reanalysis-derived correlations between the three atmospheric indices and SST AMV are evident in Figures 4a, 4d, and 4g, most consistently in February. In this month both NAO and JSI exhibit correlations that approach the lower part of the 2.5th-97.5th percentile range of the combined CMIP5 and CMIP6 data sets.
Consistent with this, broadly stronger correlations are seen for the NA SST subregions (SST SPG and SST ST ) (Figures 4b, 4c, 4e, 4f, 4h, and 4i). For the NAO, reanalysis-derived correlations with SST SPG and SST ST are strongest in January and February, although only marginally compared to March (Figures 4b and 4c). Notably, in January the reanalysis ensemble lies outside the 2.5th-97.5th percentile range of the CMIP model realizations, particularly for SST SPG . For JSI, the reanalysis correlations are strongest in January and February for SST SPG , but do not clearly exceed the CMIP range (Figure 4h). For SST ST correlations with JSI are strongest and most significant in December and January (Figure 4f), suggesting more of a subtropical linkage for the strong early season JSI variability seen in Figure 3. For U700NA, the strongest links to SST variability are for the SPG region in March, when reanalysis-derived correlations also lie closest to the edge of the CMIP ensemble range (Figure 4e). The strong correlations between U700NA and SST SPG are not unexpected since there is considerable overlap between the two regions ( Figure 1).
Overall Figures 3 and 4 present a picture of strong low-frequency atmospheric variability in mid-to-late winter that is associated with strong atmosphere-SST correlations in the same months, particularly for the SPG region, but with significant links to the subtropical NA.

Conclusions and Discussion
In this paper, an assessment of winter low-frequency atmospheric variability over the NA is presented comparing historical simulations from CMIP5 and CMIP6 with reanalysis data from 20CRv3 over the period 1862-2005. The aims are (a) to assess whether previous climate model-reanalysis discrepancies in NA low-frequency variability are reduced in the current generation of climate models and (b) broaden previous studies to assess low-frequency variability across individual winter months and a range of indices of NA atmospheric circulation.
With regard to the first aim, it is shown that the previously documented climate model-reanalysis (CMIP5-20CRv2c) discrepancy in low-frequency variability in winter-mean JSI (Bracegirdle et al., 2018) is still apparent for the current generation data sets (CMIP6 and 20CRv3). It therefore remains important to develop a clearer characterization of this discrepancy, which is a key motivation for the second aim of conducting a broader monthby-month analysis.
Considering individual months from December to March, it was found that reanalysis-derived jet speed (JSI), NAO, and U700NA all exhibit strong multidecadal variability at different points during winter; clearest in both January and February for JSI and NAO and in March for U700NA. Overall JSI variability is strong and constant through the winter from December to March, which helps to explain its dominance in winter-mean diagnostics such as the DJF means used by Bracegirdle et al. (2018). For the NAO, reanalysis-derived multidecadal low-frequency power and associated CMIP model discrepancies are similar in magnitude to that seen for JSI. For the CMIP ensemble mean there is no significant month-by-month variability in low-frequency power and large interrealization spreads for specific models (not shown). This provides evidence for a major role for internal variability in the specific timing of low-frequency power in the historical record. However, the caveat being that the relatively weak low-frequency power in climate models could be a consequence of underrepresentation of forced variability over the historical period.
One puzzle of the late-winter peak in eastern-NA (U700NA) atmospheric variability identified by Simpson et al. (2018) is difficulty in explaining why it should occur preferentially in late winter rather than mid-winter.
As they suggest, one possibility is that anomalies grow during the season in response to anomalous SSTs (e.g., Woollings et al., 2012). Taking a basin-wide perspective, the strong mid-to-late winter JSI variability found here is more concurrent with months of strong atmosphere-ocean interactions, with climatologically high ocean-atmosphere heat fluxes and large mixed layer depths widespread across the northern NA in December-March (e.g., de Boyer Montegut et al., 2004). Consistent with this, Gastineau and Frankignoul (2015) showed evidence that on decadal timescales NA atmosphere-SST links are strongest in mid-to-late winter (JFM) associated with a clearer annual reemergence of longer-term (decadal) ocean temperature anomalies below the seasonal thermocline as the mixed layer deepens in winter. This may help to explain the marked increase in strength in SST-atmosphere correlations for NAO and U700NA between December and January in Figure 4.
In mid-to-late winter months historical realizations of the CMIP5 and CMIP6 models generally fail to reproduce the reanalysis-derived low-frequency variability ( Figure 3b) and associated links to SSTs (Figure 4). Since CMIP6 models still exhibit weak low-frequency variability compared to reanalysis data, the priority of explaining the general lack of NA multidecadal variability in climate models remains. The results of the spectral analysis and atmosphere-ocean time series correlations presented here are consistent with previous evidence for explanations involving both relatively weak CMIP-simulated NA SST variability (Cheung et al., 2017;Zhang & Wang, 2013) and too-weak model-simulated atmospheric responses to anomalous SSTs (Kim et al., 2018;Peings et al., 2016;Simpson et al., 2018;Wang et al., 2017). On the latter point a key piece of evidence is that even with the historical evolution of SSTs prescribed, models still do not reproduce the magnitude of observed NA atmospheric variability (Kim et al., 2018;Simpson et al., 2018). However, a clear causal link to specific model deficiencies has not yet been made. Equatorward jet latitude biases may play a role (Ruggieri et al., 2021), whereby models with more equatorward jets exhibit weaker atmospheric responses to AMV forcing. Equatorward NA storm track and jet latitude biases that were identified in CMIP5 do indeed persist in CMIP6, although reduced slightly (Bracegirdle et al., 2022;Priestley et al., 2020;Simpson et al., 2020), suggesting a potential role for this effect in explaining the continued weak low-frequency atmospheric variability in CMIP. However, no clear link was found here between the low-frequency jet speed variability shown in Figure 3 and jet latitude across the CMIP models (not shown), although the large internal variability in models with small ensemble sizes introduces significant uncertainty that could obscure any effect. Other suggested, and potentially related, contributory factors include model resolution (Czaja et al., 2019;Scaife et al., 2019), the representation of stratosphere-troposphere interactions (Omrani et al., 2014), positive coupled ocean-atmosphere feedbacks (e.g., Osso et al., 2020) and teleconnections to the rest of the planet (Hardiman et al., 2019;Hurrell et al., 2004).
A major implication of the results presented in this study is that missing multidecadal variability in climate models potentially affects different regions at different points in the winter season depending on regional links to large-scale patterns of variability. It is shown that long-standing model-reanalysis discrepancies still persist in the latest generation of climate models (CMIP6). It is therefore imperative that research continues in this area to (a) better understand the natural and anthropogenic causes of NA multidecadal variability in the historical period, (b) evaluate model representation of these causes, (c) continue to refine long-term reanalyses and reconstructions, and (d) identify ways to improve models and/or adjust predictions to take account of their biases.

Data Availability Statement
This study used existing openly available data sets. The data that support the findings of this study are openly available at the following URLs: CMIP5 from https://esgf-node.llnl.gov/search/cmip5/; CMIP6 from https:// esgf-index1.ceda.ac.uk/search/cmip6-ceda/; and 20CRv3 from https://portal.nersc.gov/project/20C_Reanalysis/. The specific CMIP5 and CMIP6 run and version numbers of all 312 simulations that were analyzed are listed in Tables S3 and S4 in Supporting Information S1.