Diverse Surface Signatures of Stratospheric Polar Vortex Anomalies

The Arctic stratospheric polar vortex is an important driver of winter weather and climate variability and predictability in North America and Eurasia, with a downward influence that on average projects onto the North Atlantic Oscillation (NAO). While tropospheric circulation anomalies accompanying anomalous vortex states display substantial case‐by‐case variability, understanding the full diversity of the surface signatures requires larger sample sizes than those available from reanalyses. Here, we first show that a large ensemble of seasonal hindcasts realistically reproduces the observed average surface signatures for weak and strong vortex winters and produces sufficient spread for single ensemble members to be considered as alternative realizations. We then use the ensemble to analyze the diversity of surface signatures during weak and strong vortex winters. Over Eurasia, relatively few weak vortex winters are associated with large‐scale cold conditions, suggesting that the strength of the observed cold signature could be inflated due to insufficient sampling. For both weak and strong vortex winters, the canonical temperature pattern in Eurasia only clearly arises when North Atlantic sea surface temperatures are in phase with the NAO. Over North America, while the main driver of interannual winter temperature variability is the El Niño–Southern Oscillation (ENSO), the stratosphere can modulate ENSO teleconnections, affecting temperature and circulation anomalies over North America and downstream. These findings confirm that anomalous vortex states are associated with a broad spectrum of surface climate anomalies on the seasonal scale, which may not be fully captured by the small observational sample size.

Yet, while the average response is robust, the relationship between vortex strength and surface circulation is nuanced and case dependent. For example, only around two-thirds of SSWs are followed by a persistent negative NAO event, and less than a quarter of negative NAO events are preceded by an SSW (Domeisen, 2019). Departures from the most typical (or "canonical") tropospheric response have been linked to various differences across SSW events, such as the evolution of the stratospheric flow and its rate and depth of downward propagation (Karpechko et al., 2017;Maycock & Hitchcock, 2015) and the state of the troposphere as the SSW unfolds (Afargan-Gerstman & Domeisen, 2020; Garfinkel et al., 2013;White et al., 2019). Moreover, the vortex has been shown to influence surface climate in more diverse ways than is revealed by considering solely the NAO (Beerli & Grams, 2019;Domeisen et al., 2020), particularly over North America (Cohen et al., 2021;Kretschmer et al., 2018;Lee et al., 2022). There is also a complex relationship between the stratospheric vortex and tropical variability, including the Madden-Julian Oscillation and El Niño-Southern Oscillation (ENSO), which can influence the state of the vortex through tropical-extratropical teleconnections (Barnes et al., 2019;Domeisen et al., 2019;Green & Furtado, 2019) and can also directly modulate the tropospheric response to ENSO (Jiménez-Esteve & Domeisen, 2018;Knight et al., 2021). Furthermore, tropospheric precursors to extreme stratospheric states, such as blocking, may induce systematic tropospheric temperature anomaly patterns before and during the onset of anomalous vortex states, independent of the downward propagation of the stratospheric vortex anomaly (e.g., Kolstad & Charlton-Perez, 2011).
We speculate that expectations of tropospheric responses to anomalous vortex states have been influenced by recent amplified manifestations of the "canonical" response, such as the cold winter in Northern Europe in 2010 before, during and after an SSW (Cohen et al., 2010), the "Beast from the East" episodes in 2018 and 2021 (both occurring after SSWs), and the strong vortex winter of 2020 (Lawrence et al., 2020;Rupp et al., 2022), which was associated with record heat over northern Eurasia (Schubert et al., 2022). These events have also coincided with increased appreciation of the impact of stratospheric variability and our improved ability to represent it within models over the last decade (Domeisen et al., 2020a). It is therefore important to better understand and quantify the diversity in the relationship between vortex strength and surface climate, which is what we address herein.
While case studies and sensitivity experiments are essential for understanding the complexity of stratosphere-troposphere interactions, a ubiquitous challenge in climate prediction is that there are a limited number of years and events to study. For instance, after more frequent observations started in 1958 there have only been roughly six SSWs per decade . To deal with the limited sample size, Oehrlein et al. (2021) used a bootstrapping technique to explore the distribution of surface impacts of SSWs. A different approach is to leverage large climate or forecast model ensemble simulations to obtain a larger sample size for studying SSWs (Bett et al., 2022;Monnin et al., 2022;Spaeth & Birner, 2021;Wang et al., 2020) and climate variability more generally (Breivik et al., 2013;Brunner & Slater, 2022;Chen & Kumar, 2017;Kelder et al., 2020;Kent et al., 2017;Thompson et al., 2017;van den Brink et al., 2004van den Brink et al., , 2005Weaver et al., 2014).
Here, we harness hindcasts and forecasts from the European Center for Medium-range Weather Forecasts (ECMWF) seasonal prediction system SEAS5 (Johnson et al., 2019) to obtain 3,000 "potential" winter seasons between 1981 and 2020, 75 times more than the 40 observed winters in the same period. We focus on the period from December to March (DJFM), as this period has the highest frequency of SSWs    (Baldwin et al., 2003), and we study seasonal means to filter out the impacts of intraseasonal variations. We emphasize that this is not a study of specific stratospheric vortex events like SSWs. It is a study of the surface signatures related to polar vortex anomalies on the seasonal scale.
First, we validate the model's representation of stratospheric vortex variability and its linkages between anomalous vortex states and surface variables by comparing with reanalysis. Then, we investigate the most common surface signatures of anomalous vortex states by means of a clustering algorithm based on temperature anomalies over land, studying Eurasia and North America separately. Our results highlight the wide diversity of surface signatures related to both strong and weak vortex states across the two continents on the seasonal scale. We conclude by discussing the results and their relevance for forecast interpretation.

Data and Methods
We use data from two sources: the ERA5 reanalysis (Hersbach et al., 2020), and hindcasts and forecasts from the seasonal prediction system SEAS5. The variables we study are the zonal wind at 10 hPa 60°N, 2-meter temperature (T2), sea level pressure (SLP), and sea surface temperature (SST), all averaged over the extended DJFM winter season.
The SEAS5 hindcasts and forecasts go back to 1981 and are initialized on the first day of every month throughout the year. Each model run extends 7 months into the future from initialization. As we study DJFM, we are able to use model runs that are initialized in early September (lead time 4-7 months), October (lead time 3-6 months), and November (lead time 2-5 months). We do not use December initializations, as the spread among the ensemble members for the DJFM period is too narrow for the purposes of our study (especially for SST, given its slower evolution than SLP and T2). For each of the hindcast dates between 1981 and 2016, there are 25 ensemble members. From 2017 to 2020, each of the forecasts have 51 ensemble members, but we only use the first 25 of these to have the same number of members per year for the entire 1981 to 2020 period. For each year we therefore have 75 ensemble members (25 members each for September, October, and November initializations), yielding a total of 3,000 DJFM potential winters over the 40 yr record (where we adopt the term "potential" used by Spaeth and Birner (2021) to denote "potential SSWs"). All initializations are treated equally in this analysis, without distinguishing with respect to lead time.
As a metric for vortex strength, we calculate the DJFM seasonal mean of zonally averaged zonal wind at 60°N and 10 hPa. We also compute seasonal means of the other variables, on a grid point basis. For most of the analysis, we use standardized values, where the standardization is based on seasonal climatological means and standard deviations (SD). The unit is SD. The standardized version of the stratospheric winds is referred to as U10. Note that there may be other manifestations of seasonal vortex variability, such as its shape or location; we do not explicitly examine those here.
As the T2 and SST fields have substantial trends, we linearly detrend these data separately for each grid point. We do not detrend the other variables, as their trends have negligible impacts on the results.
We calculate an NAO index as the standardized principal component corresponding to the first Empirical Orthogonal Function (EOF) of (non-standardized) DJFM SLP anomalies in the domain 20°N-80°N and 90°W-40°E. The EOFs are calculated separately for each data set using the eofs Python software package (Dawson, 2016), and the first EOF explains 54% and 44% of the SLP variance for ERA5 and SEAS5 data, respectively. The correlation between the NAO index and SLP and T2 is shown in Figure A1.
The Niño 3.4 index is calculated as the standardized area-weighted average of SST anomalies between 5°S and 5°N and 170°E and 110°W. We compute another index for each data set in order to assess the pattern correlation between SST anomalies for a given sample and the typical SST anomalies associated with the NAO index in the North Atlantic. First, we compute the correlation between the NAO index and SST anomalies over all available winters, which is shown in the maps in Figure A2. Second, we multiply the SST anomalies in each grid point between 25°N and 65°N and between 80°W and 10°E by the correlations in Figure A2, and then we compute the area-weighted mean for each winter. This yields an index of length 40 for ERA5 and an index of length 3,000 for SEAS5. We define the standardized version of these as the "NAO similarity" index. Positive values indicate that the SST anomaly pattern is consistent with a positive NAO index.
For each cluster, we compute the mean anomalies of T2, SLP, and SST in each grid point, and the significance of the anomalies is estimated using a bootstrapping approach. If n is the number of ensemble members in a cluster, we calculate 1,000 synthetic mean anomalies by selecting n random ensemble members each time. If the mean anomaly of the n actual ensemble members in the cluster is greater than the 97.5th percentile or smaller than the 2.5th percentile of the synthetic set of anomalies, we deem it significant at the 5% level.

Evaluating the Model Representation of the Vortex and Its Surface Signatures
First, we assess how the vortex strength in SEAS5 compares to that of ERA5. In Figure 1a, empirical cumulative distributions of the DJFM zonal-mean zonal wind component at 10 hPa, 60°N are shown for the two data sets. To check whether the differences can be explained by the low number of samples in ERA5 (40), we generated 10,000 synthetic SEAS5 40 yr time series by selecting a random ensemble member for each year of each time series. The shading spans the interval between the 2.5th and 97.5th percentile of the distributions of the synthetic SEAS5 time series. Focusing on DJFM, the zonal winds are generally stronger in ERA5 than in SEAS5, confirming earlier results from Portal et al. (2022). In the two middle quarters, the differences are especially large, and the graphs for the individual winter months (Figures 1b-1e) indicate that the largest differences occur in early winter (December and January), while the distributions in February and March are more similar in the two data sets. The weak vortex bias in SEAS5 is consistent with Monnin et al. (2022), who found that SEAS5 produced an average of 0.88 SSWs per winter between 1981 and 2019, compared with 0.71 SSWs per winter in ERA5.
In all the individual months and for the DJFM average, the sign of the skewness parameter of the two data sets is the same. The skewness in December is strongly and significantly negative (−0.56 in SEAS5 and -0.72 in ERA5). It is encouraging that the distributions match well in the lower and upper quarters (which are the focus of our study) despite the weaker zonal wind in SEAS5. More important for the purposes of this study is that the mean near-surface temperature responses to anomalous vortex states in SEAS5 and ERA5 are comparable. To check that, we divide the data into four equal parts (quarters) based on U10. The quartiles-the boundaries between the quarters-are marked in Figure 1, and the lower quartiles are 19 ms −1 (ERA5) and 17 ms −1 (SEAS5), while the upper quartiles are 31 ms −1 (ERA5) and 25 ms −1 (SEAS5).
The mean DJFM standardized T2 and SLP anomalies corresponding to DJFM U10 values in the lower and upper quarters (hereafter "weak" and "strong" quarters) are shown for ERA5 in the left column of Figure 2, and for SEAS5 in the right column. Note that as the ERA5 averages are based on only 10 winters in each quarter, we do not assess the statistical significance of the averages shown in this figure. The main purpose of Figure 2 is to provide a qualitative, visual comparison between the two data sets. In SEAS5, each quarter has 750 potential winters.
In the weak vortex cases, the T2 and SLP patterns in ERA5 ( Figure 2a) are consistent with a negative NAO signature, both in terms of the classic SLP dipole over the North Atlantic and the corresponding T2 quadrupole (Hurrell, 1996;Stephenson & Pavan, 2003). Roughly opposite patterns are found in the strong vortex cases in Figure 2c.
It is striking that the maximum magnitudes of the mean surface anomalies in the weak U10 quarter are considerably larger in ERA5 (Figure 2a) than in SEAS5 ( Figure 2b). The latter anomalies appear "washed-out", but we emphasize that the ERA5 means are based on only 10 winters. It would be straightforward to find a set of 10 SEAS5 winters (out of the 750) that produced similarly strong mean anomalies. We now check if the differences between Figures 2a and 2b can be ascribed to sampling size differences by comparing area-weighted mean T2 anomalies in both data sets for the regions outlined in the maps in Figure 2.
In Figure 3, the orange histogram in each panel shows the full distribution of the 750 ensemble members in the weak and strong U10 quarters. These are not directly comparable to the ERA5 averages, which are computed from only 40 data points (one for each year). To obtain a comparable metric, we create 10,000 synthetic 40 yr time series based on SEAS5 data, where for each year we pull DJFM-mean U10, area-weighted T2 means, and the NAO index from the same random ensemble member out of the 75 members available. Each member of the resulting 10,000-time series is then allocated to its U10 quarter, and for each subset (each consisting of 10 values), we compute the mean value of T2 in the four subregions of Eurasia and North America, and the mean NAO index. The interval between the 2.5th and 97.5th percentiles of these 10,000 values are shown with gray shading for each U10 quarter in   a more neutral mean European T2 anomaly (−0.1), and the ERA5 value is inside, but close to the edge of, the 95% SEAS5 interval. Thus, it cannot be ruled out that the ERA5 T2 anomaly is strongly negative by chance due to small sampling (only 10 winters). Another interpretation is that SEAS5 has a positive T2 bias in the weak U10 quarter because the model does not produce sufficiently cold conditions during weak vortex winters. Both interpretations may be true. In the strong U10 quarter (Figure 3f), the positive European T2 anomaly in ERA5 (0.4) is closer to the mean SEAS5 value (0.2). For all the other regional T2 averages, the mean ERA5 values are comfortably inside the 95% SEAS5 intervals.
In response to stratospheric vortex variability, the NAO index often shows a clearer signal than surface temperatures (Domeisen et al., 2020b). Hence, we now investigate the distribution of the NAO index in the weak ( Figure 3e) and strong U10 quarters ( Figure 3j). As mentioned previously, in reanalyses about two-thirds of SSWs are followed by dominantly negative NAO conditions. This frequency was recently confirmed using seasonal hindcast data by Bett et al. (2022). In SEAS5, 68% of the ensemble members in the weak U10 quarter are NAO-negative, which corresponds well with the expected two-thirds frequency, suggesting this relationship holds on the seasonal scale and even though the weak U10 quarter includes less extreme U10 values, not just SSWs. In the strong U10 quarter, 75% of the SEAS5 members are NAO-positive, and the ERA5 and SEAS5 mean values agree well. We note that the mean ERA5 NAO values in both U10 quarters are inside the 95% intervals for SEAS5 by solid margins.
On the subcontinental scale for T2 and for the NAO index, Figure 3 shows that the ERA5 signatures are inside the range of natural variability in SEAS5. We interpret this as an indication that despite the stratospheric zonal wind biases evident in Figure 1, the SEAS5 model produces realistic linkages between anomalous vortex states and surface weather patterns during winter.
Arguably the most interesting feature in Figure 3 is the broad spectrum of each quarter of the SEAS5 data (orange histograms), which shows that the surface signatures have substantial seasonal-scale diversity. In the next section, we investigate the diversity of the surface signatures in detail.

Diversity of the Surface Temperature Response
The SEAS5 averages in Figure 2 are based on a large number of ensemble members and therefore obscure the substantial variability evident in Figure 3. To untangle this variance, we now use two different methods on the large number of ensemble members in the weak and strong U10 quarters separately. As the middle quarters represent less extreme states and are overall closer to the climatological average, we do not consider them further. In Sections 4.1 and 4.2, we use a k-means clustering analysis (see Methods), which is based on area-weighted T2 anomalies for land points inside the Eurasian and North American regions. We perform the analysis on each region separately. The goal of this analysis is to let the algorithm sort the data into clusters in which the spatial distribution of the T2 anomalies over land are similar. For practical purposes, we create three clusters for each combination of quarter and continent. For each cluster, we show the mean T2 and SLP anomalies across all the ensemble members, and we also show the mean SST anomalies for each cluster to investigate possible links lo large-scale SST patterns such as ENSO. Given that there is no objective choice of k in this case, we tested various options from two to six and concluded that three clusters give a representative picture of the diversity of surface signatures. Although differences in the population of each cluster could be taken as a measure of their relative likelihood (such as is the case with cluster-based weather regimes), we caution that there is insufficient evidence to support translating these statistics to real-world variability. Figure 4 shows the mean DJFM T2, SLP, and SST anomalies in the six Eurasian k-means clusters. The clusters are named E-W1 to E-S3 (the letters "E", "W", and "S" refer to "Eurasia", "Weak vortex", and "Strong vortex", respectively).

Eurasia
Before we describe each cluster in detail, we summarize some key results upfront. The strongest surface anomalies in Eurasia occur when the state of the vortex and the SST anomaly pattern in the North Atlantic both influence the large-scale atmospheric flow in the same direction. A case in point is the E-W1 cluster, which has both In all the panels, anomalies with a magnitude greater than 0.15 SD are significant at the 5% level, according to a bootstrapping test (see Section 2). a weak vortex and an SST pattern favorable for a negative NAO. Similarly, the E-S1 cluster has a strong vortex in combination with an SST pattern that is favorable for a positive NAO. These are the two clusters which conform best with the canonical Eurasian surface signatures associated with weak and strong vortex strength. We do not know if the SST pattern emerges as a response to the atmospheric flow or if it was pre-existing, but the patterns in E-W1 and E-S1 are in any case consistent with a feedback mechanism between North Atlantic SSTs and the atmosphere. We note that the mean Niño 3.4 values are near-neutral in all the clusters, but this does not mean that linkages between ENSO and Eurasian surface signatures are nonexistent, as we shall see in Section 4.3. Figure 3 that the average SEAS5 European and Asian T2 anomalies in the weak U10 quarter were found to be negative. Yet, Figure 4a shows that E-W1 is the only cluster in the weak U10 quarter that is clearly cold for most of the continent. All 172 members of E-W1 have a negative T2 anomaly inside the whole Eurasian region, indicating that the cluster captures continental-scale cold anomalies. The NAO index in E-W1 is strongly negative, so the cluster conforms with the expected surface signature of a weak vortex winter. The mean SST anomaly in the E-W1 cluster in the North Atlantic (Figure 4b) is similar to a pattern typical of a negative NAO (see Figure A2), and the NAO similarity index is −0.6.

Recall from
The remaining two Eurasian clusters in the weak U10 quarter both have more members than E-W1, and their surface signatures are distinctly different. The mean NAO index in E-W2 is neutral, and the European and Asian T2 anomaly is positive. Almost all (97%) of the cluster members are anomalously warm when averaged over Eurasia. E-W3, which is the largest of the three weak vortex clusters, has neutral T2 anomalies in Europe and cold conditions in Asia. The NAO index is negative, but the spatial distribution of the SLP anomalies is quite different to the one in E-W1. In contrast to E-W1, the SST signatures of E-W2 and E-W3 in the North Atlantic ( Figure 4b) bear little or no similarity to the typical NAO-negative pattern ( Figure A2), with near-zero mean NAO similarity index values. The lack of a favorable SST pattern in these clusters could be due to a lack of atmospheric forcing (because the NAO is insufficiently negative), but it is also possible that the NAO in E-W2 and E-W3 does not develop persistent negative conditions because the SST forcing is unfavorable. Possibly, both explanations are true. The fact remains that in E-W1, two factors which are known to influence the NAO index in a negative direction-a weak vortex and a favorable SST pattern in the North Atlantic-are present. In E-W2 and E-W3, only one of these factors is in place: a weak vortex. We revisit the linkages between a weak vortex and North Atlantic SSTs in Section 4.3.
The surface signatures associated with the strong vortex winters in the strong U10 quarter, shown in the bottom row of Figure 4a, exhibit considerable symmetry with the signatures during the weak vortex winters. E-S1 is essentially a mirror image of E-W1, with a strongly positive NAO index and a clearly positive average Eurasian T2 anomaly (all the 238 ensemble members are anomalously warm in Eurasia). The SST pattern of E-S1 (Figure 4b) resembles the pattern associated with a positive NAO ( Figure A2), with a mean NAO similarity index of 0.6. In E-S2, as the T2 and SST patterns are the opposite of those for E-W2 and the NAO index is neutral, the cluster average is clearly unlike the canonical surface signature of a strong polar vortex. As was the case for the weak vortex cluster E-W3, the third strong vortex cluster, E-S3, which consists of 47% of the strong vortex ensemble members, is the largest in its vortex strength category. It has a moderately strong positive NAO index of 0.6 which is reflected in its generally NAO-like surface response, although the magnitude of the anomalies is smaller than in E-S1. This may be related to the weaker NAO similarity index in E-S3 compared to E-S1.
As we saw for the clusters in the weak quarter, the expected surface signature associated with a strong vortex (strongly positive NAO and warm conditions in Eurasia) is only in place when the SST pattern is favorable. These conditions occur in E-S1, and partially in E-S3.

North America
North America has not traditionally been seen as a region where surface signatures can be clearly linked to vortex anomalies (perhaps largely due to its position upstream of NAO variability). However, some recent extreme events and research have pointed to interesting linkages between the stratosphere and the troposphere over North America (Cohen et al., 2021;Kretschmer et al., 2018). We now investigate the clusters based on North American T2 anomalies, which are shown in Figure 5. The weak vortex clusters are named NA-W1 to NA-W3 (following the naming convention introduced for the Eurasian clusters), and the strong vortex clusters are named NA-S1 to NA-S3.
As might be expected given the closer proximity and the known role of ENSO variability on the Pacific-North American region, in general the North American clusters have a stronger relationship with ENSO than the Eurasian clusters. NA-W1 and NA-S1 are both linked to strong La Niña states, and NA-W3 and NA-S3 are both associated with moderate El Niño states. Although ENSO appears to be the most important driver, our results highlight the modulating influence of the stratospheric vortex state on the teleconnection of ENSO to the North Atlantic-Eurasia region and the potential upstream influence on North America (Butler et al., 2014;Domeisen et al., 2019).
ENSO has a stratospheric pathway, which (on seasonal timescales) is typically linked to a weaker than normal vortex during El Niño, and vice versa during La Niña . This stratospheric pathway and its downward influence can either reinforce or dampen the NAO response to the ENSO tropospheric teleconnection (Jiménez-Esteve & Domeisen, 2020;Polvani et al., 2017). For example, NA-W3 shows a canonical Pacific-North American teleconnection during El Niño, with an anomalous Aleutian low, a trough over the southeastern US, and a ridge over Canada, while NA-S1 shows almost a mirror image associated with the canonical La Niña teleconnection (Butler et al., 2014;Domeisen et al., 2019). In both these clusters, the anomalous vortex strength appears to reinforce the sign of the tropospheric teleconnection of ENSO to the NAO, which has an associated signature over Europe. NA-W1 and NA-S3 also have SST patterns indicative of significant ENSO forcing, but these clusters instead fall in vortex quarters which oppose the ENSO tropospheric teleconnection influence on the NAO. For example, NA-W1 occurs during a strong La Niña state but with a weak stratospheric vortex, so the influence of La Niña and the vortex essentially cancel out over the North Atlantic.
These results thus indicate the role of the vortex in modulating the expected ENSO teleconnection to the North Atlantic region, including its surface signatures both downstream and upstream and associated seasonal-scale predictability. The two remaining North American clusters, NA-W2 and NA-S2, are nearly ENSO-neutral and thus represent the linkages of the vortex to North American surface temperatures that occur largely independently of ENSO. Although these patterns are based on opposing states of the polar vortex, both patterns are associated with cold over the North American region, though the regional spatial patterns are different (NA-W2 is only cold in CONUS, while NA-S2 is also cold in Canada/Alaska). The weak vortex cluster NA-W2 shows an anomalous ridge stretching southward into the continent, a feature which is known to yield cold surges east of the Rockies (Colle & Mass, 1995). As a result, NA-W2 is 0.7 SD colder than NA-W3 in CONUS. The strong vortex cluster NA-S2 is notably the coldest overall North American cluster, with strong advection of Arctic air associated with an extratropical SST pattern in the north-east Pacific resembling the so-called Pacific "blob" and the SST anomalies during winter 2013/2014, which drove a similar SLP and T2 anomaly pattern across North America (Hartmann, 2015;Liang et al., 2017). The anomalous high extending from Alaska along the west coast of North America, and the downstream cold anomalies, further resemble a pattern associated with downward wave reflection by the stratosphere (Messori et al., 2022;Millin et al., 2022), consistent with the strong U10 and the positive NAO downstream (Shaw & Perlwitz, 2013).

Oceanic Influence
The cluster analysis in the previous section revealed clear links between the vortex strength, the surface signatures of T2 and SLP, and SSTs. Likely, these links are nonlinear. We now use a different method to group the weak and strong vortex ensemble members, where the aim is to compare the resulting surface signatures to the signatures yielded by the clustering analysis. First, we divide all the ensemble members into four equally sized quarters according to their NAO similarity index. This allows us to divide the model data into six new groups. The first group consists of ensemble members with both a weak vortex (i.e., in the weak U10 quarter) and a negative NAO similarity index (i.e., in the lower NAO similarity index quarter), the second group holds members with a weak vortex and a neutral NAO similarity index (i.e., in one of the two middle quarters), etc. Second, we perform the same exercise using the Niño 3.4 index instead of the NAO similarity index. Figure 6 shows that when the vortex is weak and the North Atlantic SST pattern is favorable for negative NAO conditions (Figure 6a), the surface signature is consistent with a strongly negative NAO index and its canonical surface signature, with cold conditions in Eurasia and the eastern US. The number of ensemble members in the first group is 232, which is 24% more than the 187.5 members that the group would have by chance. By contrast, the weak vortex group with North Atlantic SST conditions associated with a positive NAO (Figure 6c) only has 120 members, which is 36% less than by chance. Its NAO index is weakly positive, and the surface signature surrounding the North Atlantic is largely the opposite of the canonical weak vortex signature. Further, it is interesting that the group with a weak vortex but neutral North Atlantic SST conditions (Figure 6b) only exhibits a Figure 6. Mean 2-meter temperature (filled contours) and sea level pressure (SLP; black contours every 0.3 SD starting at +/− 0.2 SD, negative dashed, positive solid) anomalies for the weak (a-c) and strong (d-f) vortex ensemble members. The columns show averages for three groups according to their North Atlantic Oscillation (NAO) similarity index, in the lower quarter (a, d), the upper quarter (c, f), and the two middle quarters in (b, e). In all the panels, anomalies with a magnitude greater than 0.2 SD are significant at the 5% level, according to a bootstrapping test (see Section 2). weakly negative NAO index and the surface anomalies are only significant in a few regions. This group contains 53% of the ensemble members, which is roughly the same as it would contain by chance (50%).
We now consider the strong vortex winters in the bottom row of Figure 6. The group of strong vortex ensemble members with North Atlantic SST conditions favorable for a positive NAO (Figure 6f) has a strongly positive NAO and the canonical surface signature, a mirror image of Figure 6a. There are 45% more ensemble members in this group than expected by chance. The group with strong vortex conditions and North Atlantic SST conditions that conflict with the expected positive NAO (Figure 6d) has an average surface signature which is the opposite of the canonical signature and consists of 35% fewer ensemble members than expected by chance. The group with neutral North Atlantic SST conditions (Figure 6e) contains 48% of the members, has a moderate positive NAO (0.6), and a recognizable, albeit weak, canonical surface signature.
In summary, Figure 6 supports our suggestion that the canonical NAO-like signature of vortex strength preferentially occurs when the North Atlantic SST pattern promotes the same sign of the NAO. When the North Atlantic SST pattern opposes the vortex-associated NAO, unexpected surface signatures arise that may be opposite-signed to those expected from the vortex influence alone. This may be because vortex extremes that dominate the U10 seasonal signal typically influence surface climate for 4-6 weeks, and additional memory of the NAO signal from the SSTs is needed to drive persistent signatures in the NAO and surface climate on the seasonal timescales we consider here.
In Figure 7, we again divide the weak and strong vortex winters into groups, but this time according to the Niño 3.4 index. There is a strong correspondence with the clusters in Figure 5. The SLP and T2 anomalies of the weak vortex/La Niña group in Figure 7a are similar to those of NA-W1, and the strong vortex/La Niña map in Figure 7d resembles the map for NA-S1. The weak vortex/El Niño and strong vortex/El Niño maps (Figures 7c and 7f) also match for the clusters NA-W3 and NA-S3, respectively. SEAS5 captures the ENSO stratospheric pathway, as the combination of La Niña and a strong vortex (Figure 7d) occurs 25% more often than La Niña with a weak vortex (Figure 7a). Similarly, a weak vortex is 34% more likely than a strong vortex during El Niño (Figures 7c and 7f). However, it is worth noting that a large number of weak vortex winters still occur during La Niña (and strong vortex winters during El Niño), with implications for ENSO teleconnections and their temperature impacts ( Figure 5). As in Figure 5, more canonical surface temperature impacts occur when the ENSO stratospheric and tropospheric pathways act in concert. The NAO is most strongly negative on average in the weak vortex/El Niño composite (Figure 7c), while the most positive NAO occurs in the strong vortex/La Niña composite (Figure 7d). Nonetheless, for the NAO the influence of the vortex strength dominates over ENSO (Polvani et al., 2017); for example, the difference in the NAO during La Niña between weak and strong vortex winters is 1.1 SD, but during weak vortex winters between La Niña and El Niño it is only 0.3 SD. A clear separation of vortex and ENSO signatures on surface temperatures is more difficult and less linear than for the NAO.

Summary and Discussion
Anomalous stratospheric polar vortex states are linked to weather events at the surface. However, the rarity of such events hampers our understanding of the full range of stratosphere-troposphere linkages. By using a large ensemble of model simulations, and by focusing on seasonal winter means, we filter out the impact of individual vortex events and elucidate the spectrum of possible persistent surface signatures. We are also able to link vortex and surface states to the oceanic background conditions, tying together two leading drivers and predictors of Northern Hemisphere seasonal climate. We do this separately for Eurasia and North America, which are influenced by stratospheric and oceanic variability in different ways.
By comparing the performance of the SEAS5 model to the ERA5 reanalysis, we show that the forecast model realistically reproduces vortex characteristics, as well as linkages between the vortex and surface weather. Although the average NAO index and Eurasian T2 anomaly associated with a weak vortex are more negative in ERA5 than in SEAS5, we show that both metrics in ERA5 are within natural variability from the much larger sample size in SEAS5. This suggests that the strengths of the negative NAO and European T2 anomalies related to weak vortex events in the observed climate after 1980 could be higher than what is generally expected, perhaps simply because of the limited sample size in reanalysis with only 40 winters to study.
A cluster analysis performed on the SEAS5 ensemble members with the 25% weakest and strongest vortex winters yielded several notable results. Only one of the three weak vortex Eurasian clusters is clearly cold for most of the continent and has a strongly negative NAO index. This is qualitatively consistent with our finding that the mean European T2 anomaly during weak vortex winters is considerably more negative in ERA5 than in SEAS5. We find that the Eurasian surface signatures are associated with North Atlantic SSTs as well as the state of the vortex. Though our approach makes it difficult to determine with certainty whether the SSTs are forcing or responding to the atmospheric anomalies, our results suggest that the atmosphere-ocean feedback is important for maintaining the persistence of the NAO signature. For example, in the coldest of the three Eurasian weak vortex clusters, the North Atlantic SST pattern is favorable for a negative NAO, while the SST pattern is neutral with respect to the NAO in the remaining two clusters. This indicates that cold Eurasian conditions preferentially occur when there is a synergy between the oceanic conditions and the weak vortex state. Similarly, the North Atlantic SST patterns in the two strong vortex clusters with a clearly positive NAO are favorable for a positive NAO. In the strong vortex cluster with the most strongly positive NAO, the average Niño 3.4 index is negative, suggesting a role for a La Niña teleconnection consistent with previous studies (Iza et al., 2016;Polvani et al., 2017).
The linkage between tropical SSTs (e.g., ENSO) and the surface signature is more dominant in North America than in Eurasia. For both weak and strong vortex terciles, the clusters exhibit either an El Niño, ENSO-neutral, or La Niña state. For those clusters with active ENSO forcing, the ENSO tropospheric teleconnection is then either amplified or dampened by the downward influence of the vortex on the NAO. La Niña winters exhibit the expected T2 dipole over North America, with cold conditions in the northwest and warm conditions in the south east US, but the T2 anomalies are substantially stronger when La Niña conditions coincide with a strong vortex and a positive NAO index than when the vortex is weak and the NAO is negative. Similarly, El Niño winters are warm in the northwest US and cold in the southeast US, but the T2 anomalies are stronger when the vortex is weak and the NAO is negative than when the vortex is strong and the NAO is positive. The two remaining clusters with near-neutral ENSO demonstrate the influence of the vortex on North America independently of ENSO. Interestingly, for both the weak and strong vortex ENSO-neutral clusters the response is anomalously negative for T2 over the entire North America region, though the patterns differ spatially. The regional differences are dependent on exactly where the anomalous ridge sets up. Notably, the strong vortex cluster is linked to strong ridging over the western US, which has previously been linked on the sub-seasonal scale to wave reflection and associated cold air outbreaks over North America (Messori et al., 2022;Millin et al., 2022).
While we have shown evidence to support a realistic representation of stratosphere-troposphere coupling in SEAS5, it is still plausible that our results are influenced by model deficiencies. As such, performing similar analyses using other models would help ascertain the robustness of our conclusions.
Our key takeaway is that while weak vortex cases are often associated with cold temperatures over parts of Eurasia and North America, and vice versa for strong vortex cases, there are also scenarios when instead the opposite is true. In fact, a relatively low proportion of the surface impacts of weak vortex winters conform to the classic NAO-negative regime. Thus, the average or "canonical" surface signature related to seasonal-scale polar vortex variability-while robust-disguises substantial variability which we cannot fully appreciate from the relatively small observational sample size. Our results highlight the need for probabilistic predictions and a nuanced analysis of confounding factors when using forecasts of the stratosphere for predicting surface winter weather. Hence, decisions based on stratosphere-troposphere coupling, for instance in the energy markets, can be improved by a greater understanding of this variability and its relationship to concurrent SST patterns, with benefits to wider society (e.g., more reliable consumer energy prices).

Appendix A: NAO Relationships
As the NAO is a key component of the surface signatures, we show the correlation between the NAO index and SLP and T2 in ERA5 and SEAS5 in Figure A1. The correlation maps are overall very similar, indicating that SEAS5 represents NAO variability well. However, we note that there are some small differences. For SLP, the correlation pattern in the Atlantic is slightly more zonally oriented in ERA5 ( Figure A1a) than in SEAS5 ( Figure A1c). One effect of this is that the positive correlation for T2 in ERA5 ( Figure A1b) is confined to the Eurasian continent, while in SEAS5 ( Figure A1d), positive correlations are also found in the Nordic Seas. The T2 correlation is also higher in Eurasia in ERA5-which means that NAO anomalies have a stronger effect on Eurasian T2-than in SEAS5. We compare the relationship between the NAO index and SST in ERA5 and SEAS5 in Figure A2. The correlation patterns for the two data sets are similar. In particular, the pattern in the North Atlantic is defined as a tripole which is known from many previous studies to describe the relationship between the NAO and SSTs (e.g., Cassou et al., 2007;Czaja & Frankignoul, 2002;Rodwell et al., 1999). There is also strong positive correlation in the North, Norwegian and Baltic Seas and somewhat weaker but significant positive correlation in the Northwest Pacific.

Data Availability Statement
The ERA5 reanalysis and the SEAS5 hindcast data are available from the Copernicus Climate Change Service (C3S) via the Climate Data Store (https://cds.climate.copernicus.eu/cdsapp#!/search?type=dataset). Figure A2. Correlation between the North Atlantic Oscillation (NAO) index and sea surface temperatures (SSTs), based on all the 40 DJFM seasons in ERA5 (a) and all the 3,000 DJFM ensemble members in SEAS5 (b). Correlations of magnitude greater than 0.4 are significant at the 5% level for ERA5 (0.1 for SEAS5) data.