Underpredicted ENSO Teleconnections in Seasonal Forecasts

The El Niño‐Southern Oscillation (ENSO) influences climate variability across the globe. ENSO is highly predictable on seasonal timescales and therefore its teleconnections are a source of extratropical forecast skill. To fully harness this predictability, teleconnections must be represented accurately in seasonal forecasts. We find that a multimodel ensemble from five seasonal forecast systems can successfully capture the spatial structure of the late winter (JFM) El Niño teleconnection to the North Atlantic via North America, but the simulated amplitude is half of that observed. We find that weak amplitude teleconnections exist in all five models throughout the troposphere, and that the La Niña teleconnection is also weak. We find evidence that tropical forcing of the El Niño teleconnection is not underestimated and instead, deficiencies are likely to emerge in the extratropics. We investigate the impact of underestimated teleconnection strength on North Atlantic winter predictability, including its relevance to the signal‐to‐noise paradox.

While there is extensive literature on the winter extratropical response to ENSO in observations and free-running GCMs, existing literature on modeled amplitude of the extratropical response, or the performance of seasonal forecasts in capturing the response is relatively limited. Studies which consider the ability of seasonal forecasts to capture ENSO teleconnections typically use a single seasonal forecast system, and focus on a particular aspect of teleconnections in observations and models rather than seasonal forecast capability alone. Ayarzagüena et al. (2018) found that the early winter (ND) teleconnection from El Niño to the North Pacific is weaker than observed in the Met Office GloSea5 system (MacLachlan et al., 2015). Abid et al. (2021) found that in December, North Pacific upper tropospheric geopotential height regressed on to ENSO is underestimated in the ECMWF SEAS5 system (Johnson et al., 2019). Very few studies have used multimodel forecasts to examine their representation of ENSO-extratropical teleconnections. L' Heureux et al. (2017) showed that ENSO and the Arctic Oscillation are more strongly correlated than observed throughout winter in the North American Multimodel Ensemble (Kirtman et al., 2014), but they did not examine the strength of modeled teleconnections relative to observed. Deser et al. (2017) found that the amplitude of the DJF Aleutian Low response to ENSO is not significantly different to observed in three free-running models and a set of pacemaker simulations.
We investigate the performance of five seasonal forecast systems in capturing teleconnections from ENSO to the North Pacific, North America, and the North Atlantic. We then seek to establish the region in which teleconnection errors originate. We then further examine the North Pacific-North Atlantic teleconnection pathway (cf. Honda et al., 2001) in order to understand the effect of errors in modeled teleconnections on North Atlantic predictability in the context of the signal-to-noise paradox.

Data and Methods
We use hindcasts for the winters 1993/1994 to 2016/2017 from five seasonal forecast systems. Met Office GloSea5 (MacLachlan et al., 2015; hereafter UKMO) hindcasts have 21 members, with 7 each initialized on 25th October, 1st November, and 8th November. Météo-France System 8 (Voldoire et al., 2019; hindcasts have 25 members, with 1 initialized on 1st November, 12 initialized on the last Thursday of October, and 12 initialized on the penultimate Thursday of October. CMCC-SPS3 (Sanna et al., 2016;CMCC), DWD GCFS 2 (Fröhlich et al., 2021;DWD), and ECMWF SEAS5 (Johnson et al., 2019;ECMWF) hindcasts respectively have 40, 30, and 25 members initialized on 1st November. Where multimodel means have been computed, ensemble members are weighted equally, although results are robust to equal weighting of models (not shown). The JRA-55 (Kobayashi et al., 2015) reanalysis with data from 1979/80 to 2016/17 is used for observations, and the ERA5 (Hersbach et al., 2020) reanalysis with the same period is used for comparison. Additionally, the Global Precipitation Climatology Project v2.3 (Adler et al., 2003; hereafter GPCP) observational dataset from 1979/1980 to 2016/2017 is also used for precipitation, and the HadISST 2.2 dataset (Kennedy et al., 2017;hereafter HadISST) from 1950/51 to 2014/15 is also used for sea surface temperatures. For sea surface temperatures, December-March (DJFM) means are used. For all other fields, January-March (JFM) means are used. Late winter is the focus of this work as the North Atlantic response to ENSO is most robust during this period (Moron & Gouirand, 2003).
The Niño 3.4 index (Trenberth, 1997)-defined as the mean sea surface temperature anomaly ΔT between 190 and 240°E and 5°S-5°N-is computed for the JRA-55 reanalysis and is used to split winters into El Niño (ΔT > 0.5 K), La Niña (ΔT < −0.5 K), and neutral (|ΔT| < 0.5 K) phases. Of the 24 hindcast winters, 6 are classified as El Niño, 9 as La Niña, and 9 as neutral. The 38 winters in the reanalysis period consist of 10 El Niño, 11 La Niña, and 17 neutral seasons. Hindcast Niño 3.4 indices correlate strongly with observations (r is between 0.96 and 0.99 for the ensemble mean of each model).
For a given field, the response to El Niño is defined as the composite mean of El Niño years for that field with the neutral composite subtracted. Similarly, the response to La Niña is the La Niña composite with neutral subtracted. For the purpose of quantifying uncertainty, standard deviations for observed El Niño responses are computed as the standard deviation of all El Niño years for the relevant dataset. In order to compare model standard deviations with observations, a sampling technique is used. For each sample, a neutral mean is computed using a random member from each neutral year. This is then subtracted from a random ensemble member from an El Niño year. Ten thousand samples are taken, for which the standard deviation is computed.
Tropical East Pacific precipitation (TEP) is defined as the mean precipitation between 160 and 240°E and 5°S-5°N, and is strongly coupled to ENSO. Tropical West Pacific precipitation (TWP) is defined as the mean 10.1029/2022GL101689 3 of 9 precipitation between 110 and 150°E and 0-20°N. These boxes were chosen to capture the strongest precipitation responses during both El Niño and La Niña years, and are similar to the TEP and TWP boxes used in Scaife et al. (2017). TWP is considered as it has a known impact on the extratropical Northern Hemisphere, including the North Atlantic Oscillation (Kucharski et al., 2006;Scaife et al., 2017;Scaife, Ferranti, et al., 2019).
The generation of Rossby waves requires a Rossby wave source (RWS) S (Sardeshmukh & Hoskins, 1988) where ζ is the absolute vorticity and u χ is the divergent component of the horizontal wind. Increased tropical precipitation due to surface heating is associated with a baroclinic divergence response, with convergence near the surface and divergence at the level of convective outflow. Tropical divergence anomalies lead to divergent winds over a broader region, extending into areas with more favorable conditions for Rossby wave propagation. We found that the upper level divergent response to ENSO in the Pacific is highly robust at 200 hPa, but it is not robust at lower model levels. This level is consistent with previous studies (e.g., Sardeshmukh & Hoskins, 1988;Scaife et al., 2017). Diagnosing the strength of the teleconnection forcing using the full Rossby wave source presents difficulties as any underprediction of Rossby wave source strength could be caused by, as well as the cause of, weak teleconnections-Rossby waves themselves cause anomalous vorticity and divergence, which in turn leads to a Rossby wave source. Previous studies (Mezzina et al., 2020;Qin & Robinson, 1993) have identified that the first term of Equation 1, associated with advection of vorticity by the divergent wind, is more characteristic of the tropical forcing of the teleconnection, while the second term is more dependent on secondary sources. We define the advective Rossby wave source S adv as where ′ is the anomalous component of the divergent wind, and is the background absolute vorticity, computed using neutral ENSO years only. This is similar to the Tropical Rossby Wave Source computed in Mezzina et al. (2020), except they use all years to compute the background vorticity. They identify a robust negative response to El Niño in the (Northern Hemisphere) subtropical central Pacific which is primarily a result of the overlap of anomalous divergent winds caused by increased tropical precipitation, and a large background vorticity gradient caused by the subtropical jet stream. We quantify this response using a box region (CP) covering 160-200°E and 20-35°N. We also find a robust positive response to its west and so we define a region (WP) covering 110-150°E and 20-35°N. The regions used for precipitation and advective Rossby wave source (TWP, TEP, WP, CP) are shown in Figure S1 in Supporting Information S1.
Due to the observed asymmetry of the North Pacific response to El Niño and La Niña, boxed regions of 180-220°E and 40-60°N for El Niño, and 170-210°E and 30-50°N for La Niña, are used when comparing the strength of the extratropical responses. These boxes are chosen in order to capture the strongest response in each phase.
The Aleutian Low is defined as the mean 300 hPa geopotential height z between 170 and 220°E and 30-60°N. This is a region of climatological low geopotential height with high interannual variability, and the boundaries are chosen to capture the regions with the strongest responses during El Niño and La Niña years. Aleutian Low composites are taken by defining negative and positive phases, where the Aleutian Low is deeper (more negative) or shallower (less negative) than the mean respectively. As the correlation between models and observation is less strong for the Aleutian Low than it is for the Niño 3.4 index (0.5 < r < 0.7 for the ensemble mean of models), model Aleutian Low indices are used to create model composites instead of the observed index. The 38 reanalysis winters consist of 17 negative seasons and 21 positive seasons. The multimodel ensemble mean evenly splits into 12 negative and 12 positive winters. DWD has 11 negative winters out of 24, UKMO and M-F both have 12, and ECMWF and CMCC both have 14. Figure 1a shows the response of 300 hPa geopotential height to El Niño computed using JRA-55 data. A significant increase in geopotential height occurs throughout most of the tropics due to tropical warming and the canonical Pacific-North American pattern (e.g., Horel and Wallace, 1981) is present. In the North Atlantic a tripole pattern emerges, with a large positive anomaly across the basin just south of Greenland, and negative anomalies to the north east and south west. Figure 1b shows the multimodel response to El Niño. The simulated pattern is very similar to that observed but the strength of the response in the extratropical Northern Hemisphere is severely underestimated. This is most evident over the North Pacific, where the magnitude of the composite response is greater than 90 m over a large region in observations but does not exceed 60 m anywhere in the multimodel mean. Figures 1c-1g show the individual model responses. Each model captures the structure of the response over the North Pacific and North America, and all except the Météo-France model show a North Atlantic pattern which resembles the observed pattern. The underestimated amplitude of the teleconnection is present in all five models. Additionally, in all models the North Pacific response is tilted on a north west-south east axis relative to observations. While the response over North America appears to be weak, significance is lower than in the North Pacific.

Weak Teleconnections
While there is reasonable agreement between observed and modeled teleconnections in the North Atlantic, the observed North Atlantic pattern for this period closely resembles the Atlantic wavetrain response to strong El Niño events found in Toniazzo and Scaife (2006). Removing the four El Niño events (1983,1992,1998,2016) classed as strong by their definition (ΔT > 1.5 K) from the observed composite leads to stronger agreement with the multimodel mean (not shown). This may mean that models are unable to capture the Tropical Atlantic pathway for strong El Niño events found in Toniazzo and Scaife (2006). However, the limited availability of observed El Niño years means that caution should be taken when comparing strength-based subsamples of the available data. Figure 2a shows that the extratropical Pacific response to El Niño is underestimated throughout the troposphere and in all five models, by about half in four models and even more than this in M-F. Figure 2b shows that the response to La Niña is also underestimated in all models, at all tropospheric levels except for 100 hPa. The modeled El Niño and La Niña responses are not equally weak, with a less weak response to La Niña in most models. For this reason, we focus on the response to El Niño in the next section. Figure S2 in Supporting Infor mation S1 shows maps of the 300 hPa geopotential height response to La Niña, calculated using JRA-55 (a) and the multimodel ensemble mean (b). The peak strength of the response to La Niña in the North Pacific is strongly underestimated in the multimodel mean, but the spatial extent of the positive geopotential height anomaly is broader than observed.

Errors in the Teleconnection Pathway
We next consider where the model errors emerge in the chain of processes involved in the teleconnection. Figure 3 shows the strength of the response of six indices to El Niño in observational datasets and in hindcasts, with ± two standard errors from the mean shown to represent uncertainty. Figure 3a shows that all five models actually overestimate changes in Niño 3.4 sea surface temperatures relative to JRA-55, although the model means are within the uncertainty range. This suggests that if sea surface temperatures were perfectly predicted, the underestimation of the modeled teleconnection would be even worse. When comparing to ERA5, there is almost no error in the multimodel mean response. All five models are above the uncertainty range of HadISST, although it should be noted that the HadISST time period (1950/51-2014/15) is different to the reanalysis period. We found strong agreement between HadISST and the two reanalyses when only considering the years which are in all observational datasets (1979/80-2014/15; not shown). Modeled responses of East Pacific (TEP) rainfall to El Niño ( Figure 3b) are similar to those in JRA-55, with two models underestimating the response and three models overestimating, and the difference between the JRA-55 and multimodel mean is much smaller than their respective uncertainties. ERA5 and GPCP are in strong agreement with JRA-55 for the TEP response to El Niño. For West Pacific (TWP) rainfall (Figure 3c) all three observational datasets and all five models show similar responses, with model responses well within the uncertainty range of each observational dataset. Figure 3d shows the 200 hPa advective RWS response to El Niño in the central Pacific region, calculated using JRA-55, ERA5 and each model. The observational estimates have very similar mean responses. The multimodel mean is slightly weaker than observations, but it is well within the margin of uncertainty and all five individual models are within the two standard error range of both reanalyses. Figure 3e shows the 200 hPa advective RWS response in the west Pacific. The multimodel mean response is weaker than that of CP advective RWS (around 85% of the JRA-55 response), but it is still well within the margin of error of both reanalyses. Finally, the extratropical anomalies in Figure 3f demonstrate the significance of the weak response of North Pacific geopotential height to El Niño, as the strength of the multimodel mean is below the two standard error range from the JRA-55 mean. JRA-55 and ERA5 are in strong agreement on both the mean response and its uncertainty.
If it is assumed that the multimodel advective RWS response is systematically weak, the extent to which it is weak (a factor of around 0.85 relative to JRA-55 for CP and 0.9 for WP) is not sufficient to explain the weak North Pacific response (a factor of around 0.55) without the influence of highly nonlinear mechanisms further ahead in the teleconnection pathway. Therefore, the tropical forcing of the teleconnection does not appear to be the source of the weak El Niño teleconnection. For La Niña we found that the western Pacific precipitation and advective RWS responses are underestimated in all models except DWD, with high significance for the multimodel mean and three of the models in the case of TWP precipitation (not shown). This makes it more difficult to separate tropical and extratropical effects which contribute to underestimation of teleconnection strength.
Finally, we consider the link from the Pacific to the Atlantic. Figure 4a shows the geopotential height response to a unit deepening of the Aleutian Low, for JRA-55 data. Figure 4b shows the same, but for the multimodel ensemble. At mid-latitudes, the geopotential height response to deepening of the Aleutian Low is relatively accurate in the multimodel ensemble-a negative NAO-like pattern (cf. Honda et al., 2001) exists in both observations and the multimodel mean, and a similar dipole anomaly occurs over North America. The amplitude of the pattern is much closer to the observed link than is the case for the response to ENSO. This suggests that improving the strength of the Aleutian Low response to ENSO would improve prediction over North America and in the North Atlantic sector, as well as in the North Pacific sector. Figures 4c-4g show the response for individual models. Each model has a similar response to the multimodel mean and observations, although they vary in how far the dipole extends eastwards into the North Atlantic, and the North Atlantic response in the ECMWF model is weaker than others.

Discussion and Conclusions
Underprediction of teleconnections has serious ramifications for seasonal forecasts. If total ensemble variance of forecasts is realistic, halving the strength of teleconnections will degrade model predictability and reduce skill. These results also have a bearing on the signal-to-noise paradox on interannual timescales. However, this problem is also known to exist in longer-term predictions (Eade et al., 2014;Smith et al., 2020) where ENSO is not the main driver of predictable signals. Therefore, while ENSO is unlikely to be the direct cause of the paradox, its weak teleconnections may have the same origin, as has also been found for the Quasi-Biennial Oscillation (O'Reilly et al., 2019). It is therefore necessary to establish the cause of weak teleconnections in general, in order to not only improve modeled ENSO-associated interannual variability but potentially to also improve modeled variability due to other drivers and at different timescales. Leading hypotheses for this more general problem include transient eddy feedback (Hardiman et al., 2022;Scaife, Camp, et al., 2019) and ocean-atmosphere interaction (Ossó et al., 2020;Zhang & Kirtman, 2019). ENSO is a global driver of climate variability, but its influence on North American and European winters is underestimated in current seasonal forecasts. Solving this problem would reduce uncertainty and increase skill in winter prediction in these regions.
The underestimation of the El Niño teleconnection amplitude has strong statistical significance (cf. Figure 3f) despite the uncertainty of the observed amplitude due to the limited number of observed events (cf. Deser et al., 2017). This level of significance was found to remain when including pre-satellite era JRA-55 and ERA5 reanalysis data (starting from 1958/59 and 1950/51 respectively; not shown).
Consistent with our findings, it has recently been found that the strength of the ENSO-North Pacific teleconnection is also underestimated in December-initialized subseasonal forecasts looking 6 weeks ahead (Garfinkel et al., 2022). Our results are also consistent with Knight et al. (2022), which found that the signal-to-noise paradox persists in seasonal forecasts when the tropics are relaxed to observed conditions. Although we found no evidence of systematic underestimation of the tropical forcing of the El Niño teleconnection, we found that the western Pacific tropical forcing of the La Niña teleconnection is significantly underestimated in most of the models. As the North Pacific response to La Niña is less weak in most models than the response to El Niño (cf. Figure 2), it is likely that the extratropical cause of the weak El Niño teleconnection does not have a symmetric effect during La Niña. Further work focusing on ensemble members with the most accurate teleconnections may provide more insight in to the origin of the issue.
We show that the modeled North Pacific-North Atlantic teleconnection is accurate, which is highly relevant for the overall ENSO-North Atlantic teleconnection. Alteration of the Hadley Cell during El Niño (Wang, 2002) leads to a teleconnection pathway via the tropical Atlantic instead of the North Pacific (Toniazzo & Scaife, 2006). Further work is necessary to establish the capability of seasonal forecasts at representing this teleconnection.
This study demonstrates that current seasonal forecasts are unable to capture the strength of the atmospheric response to ENSO in the North Pacific, which in turn affects ENSO teleconnections to the North Atlantic. This has an impact on seasonal prediction of wind speed, temperature, and precipitation during winter in North America and Europe. . Teleconnections between the Pacific and Atlantic basins. 300 hPa geopotential height response to a unit deepening of the Aleutian Low. Defined as the difference between composites of geopotential height during years with a deeper-than-average Aleutian Low and those with a less deep Aleutian Low, divided by the absolute difference in the mean Aleutian Low for the same sets of years. For JRA-55 (a), the multimodel mean (b) and individual models ((c)-(g), labeled). Black line contours on (a) bound regions where the difference is significant at the 10% level in observations according to a two-tailed t test. Gray line contours on (b) bound regions where the observed and multimodel results are significantly different at the 10% level according to a two-tailed t test.

Data Availability Statement
Hindcast data from the five seasonal forecast systems used in this work are freely available online (https:// doi.org/10.24381/cds.0b79e7c5 for geopotential and wind component data, and https://doi.org/10.24381/ cds.68dd14c3 for all other fields), with the following originating centre and system labels: UK Met Office, 15 for UKMO; ECMWF, 5 for ECMWF; Météo France, 8 for M-F; DWD, 21 for DWD; CMCC, 35 for CMCC. Hindcast data was produced by the institutes that developed each forecast system: The UK Met Office for UKMO (MacLachlan et al., 2015); the European Centre for Medium-Range Weather Forecasts for ECMWF (Johnson et al., 2019); Météo-France for M-F (Voldoire et al., 2019); The Deutscher Wetterdienst for DWD (Fröhlich et al., 2021); The Euro-Mediterranean Center on Climate Change for CMCC (Sanna et al., 2016).