It is shown that a global atmospheric model with horizontal resolution typical of that used in operational numerical weather prediction is able to simulate non-gaussian probability distributions associated with the climatology of quasi-persistent Euro-Atlantic weather regimes. The spatial patterns of these simulated regimes are remarkably accurate. By contrast, the same model, integrated at a resolution more typical of current climate models, shows no statistically significant evidence of such non-gaussian regime structures, and the spatial structure of the corresponding clusters are not accurate. Hence, whilst studies typically show incremental improvements in first and second moments of climatological distributions of the large-scale flow with increasing model resolution, here a real step change in the higher-order moments is found. It is argued that these results have profound implications for the ability of high resolution limited-area models, forced by low resolution global models, to simulate reliably, regional climate change signals.
 There is a substantial difference in the horizontal resolution of global atmospheric models used in operational numerical weather prediction (NWP) and of the atmospheric climate models used, for example in the fifth Coupled Model Intercomparison Project (CMIP5). Atmospheric horizontal resolutions used in operational NWP are typically in the range 16–50 km whilst in CMIP5 model resolutions are typically in the range 1°–2°. Increased horizontal resolution in climate models has led to incremental improvements in various aspects of mean climate and variability, although many key processes appear to be fairly well represented at resolutions typical of CMIP5 models [e.g., Shaffrey et al., 2009; Dawson et al., 2011]. Whilst few would doubt the desirability of being able to integrate climate models at NWP resolution, climate modelling centres face numerous calls on available computer resources: the need to incorporate Earth System complexity, to make ensembles of integrations and to integrate over century and longer timescales. It is therefore hoped that, at least on planetary scales, the difference between NWP and climate resolution should not make too quantitative a difference in simulation accuracy. This hope is to some extent supported by studies which address the impact of resolution on simulations of the time mean state of the atmosphere and variance around this state [Jung et al., 2012; Dawson et al., 2012].
 On the other hand, the geometry of the climate attractor is not determined solely by its first two moments. In particular, considerable evidence has emerged over the last decade or more, for quasi-persistent weather regimes over different regions of the world [Straus et al., 2007; Straus, 2010; Woollings et al., 2010a; Pohl and Fauchereau, 2012]. The existence of such regimes is not only of interest in its own right: there is evidence that in a dynamical system with regime structure, the time-mean response of the system to some imposed forcing, which here could be thought of as enhanced greenhouse gas concentration, is in part determined by the change in frequency of occurrence of the naturally occurring regimes [Palmer, 1999, 1993; Corti et al., 1999]. As such, a model which failed to simulate observed regime structures well, could qualitatively fail to simulate the correct response to this imposed forcing.
 This paper presents a study of the ability of a state-of-the-art global atmospheric model, integrated in atmosphere-only mode at two different horizontal resolutions representative of NWP and climate models, to simulate Euro-Atlantic regime structures as found in reanalysis datasets. It is shown that whilst the NWP resolution model simulates the regimes well, the same model integrated at climate resolution has no statistically significant regimes at all.
 This study supports the growing recognition that there is no more complex problem in computational science than that of simulating climate, and next generation climate simulators should be developed at current NWP resolutions - the need for Earth System complexity and ensemble capability notwithstanding. For most climate institutes, this will require substantial enhancements in computing capability.
 The analysis presented here is based on daily fields of wintertime (December–March; DJFM) geopotential height on the 500 hPa pressure surface. A seasonal cycle is obtained by averaging the seasonal time series at each grid point over all years. This cycle is then smoothed using a 5-day mean before being subtracted from the daily time series to produce an anomaly time series at each grid point. Empirical orthogonal function (EOF) analysis is then used to reduce the dimensionality of the anomaly data set. The EOF analysis is performed on a European/Atlantic domain defined by the sector 30°–90°N, 80°W–40°E. The principal components (PCs), the time series of the EOF patterns, form the coordinates of a reduced phase space.
 The k-means cluster analysis method [e.g.,Michelangeli et al., 1995; Straus et al., 2007] is used to identify clusters in the reduced phase space. The clustering procedure aims to identify preferred regions of the phase space, which can be interpreted in the framework of regimes.
 The null hypothesis when applying the cluster analysis is that there are no regimes, and hence that the probability density function (pdf) of the underlying phase space follows a multi-normal distribution. In order to assess if this null hypothesis can be rejected, Monte Carlo simulations using a large number of synthetic data sets are applied, as inStraus et al. . The cluster analysis is applied to 500 synthetic data sets, each one composed of independent Markov processes having the same lag-1 autocorrelation and skewness as the corresponding PC of the data set being tested. Significance is reported as the percentage of times the optimal variance ratio computed for a synthetic data set does not exceed the variance ratio obtained from clustering in the data set being tested. Large values of significance for a cluster partition therefore indicate that the given variance ratio is unlikely to have been found by chance in a data set with a multi-normal pdf, or simply as a result of skewness in the PCs that make up the data set.
Christiansen  raised questions about the ability of k-means clustering to determine the correct number of clusters in a data set. The present study uses cluster analysis to define a metric for non-gaussian structures that can be applied to both observations and models. In the context of model evaluation and comparison the concerns over the ability ofk-means clustering to find the correct number of clusters become less relevant.Fereday et al. reported difficulty defining unambiguous European/Atlantic circulation regimes. However, their technique involved finding clusters in mean sea level pressure, with no reduction of dimensionality, and sampling over 2-month seasons. Additional analysis using shorter seasons and varying the dimension of the clustering space suggests that theFereday et al.  sampling strategy is likely not sufficient to adequately sample such a large space.
 The model used in this study is the European Centre for Medium-Range Weather Forecasts (ECMWF) Integrated Forecast System (IFS) Cycle 36r1. We present results from continuous atmosphere-only integrations of the IFS for the 45-year period 1962–2006, forced with observed sea surface temperatures and sea ice fields, integrated as part of the Athena Project [Jung et al., 2012; J. Kinter et al., manuscript in preparation, 2012]. Two configurations of the IFS are used, a high resolution configuration integrated at T1279 resolution and a low resolution configuration integrated at T159 resolution. These triangular truncations correspond to approximate grid spacings of 16 km and 125 km respectively. A horizontal resolution of T159 is typical of the atmospheric resolution of coupled climate models, whereas T1279 is ECMWF's operational deterministic forecast resolution and is more typical of NWP model resolution. The vertical discretization and most of the physical parameterization schemes are the same in the two configurations. These models therefore represent a relatively clean comparison between high and low horizontal resolutions. Lack of computing resources prevented corresponding integrations being made at intermediate resolutions.
 We compare the models to reanalysis data for the same 45 year period. The reanalysis data set used is a combination of the ECMWF 40-year reanalysis (ERA-40; 1962–1988) and the ECMWF interim reanalysis (ERA-Interim; 1989–2006). This is the same data the model configurations use for forcing. This data set is referred to as ERA.
 Prior to EOF analysis, data from both the ERA and the model data sets are interpolated onto a 2.5° × 2.5° latitude × longitude grid. This procedure ensures that variability is considered on the same range of horizontal scales in all cases. We show results from clustering in the phase space spanned by the leading 4 EOFs. The leading 4 EOFs account for around 50% of the variance in geopotential height in ERA and both model configurations.
3. Results and Discussion
Figures 1a–1d show composite height maps for the k = 4 cluster partition from ERA. The clusters are presented in order of their climatological frequency. The significance of this cluster partition is 99.8%, as determined by the test described in Section 2. Therefore these clusters are interpreted as circulation regimes in the European/Atlantic domain. We note that the significance values for the k = 2 and k = 3 cluster partitions are considerably lower (64.2% and 87% respectively). On this basis we exclude the possibility of cluster partitions with k < 4. The high level of significance in the k= 4 partition is also found when the clustering is applied in a larger 11-dimensional phase space, which retains around 80% of the total variance in geopotential height in the European/Atlantic domain. This is indicative of the robustness of the clusters in the ERA data set.
 We refer to clusters 1 and 4 (Figures 1a and 1d) as NAO+ and NAO− regimes respectively, as they are consistent with the spatial patterns of the positive and negative phases of the North Atlantic Oscillation [e.g., Woollings et al., 2010b; Hurrell and Deser, 2010]. The second most frequent cluster (Figure 1b) has a positive geopotential height anomaly centered over Scandinavia with negative anomalies to the east and the west. This pattern is referred to as a blocking (BL) regime. The third most frequent cluster (Figure 1c) consists of a positive geopotential height anomaly over the North Atlantic and a negative anomaly over Scandinavia and eastern Europe. This cluster is referred to as the Atlantic Ridge (AR) regime, although we note the pattern has much in common with the East Atlantic pattern [e.g., Wallace and Gutzler, 1981]. These four regimes are qualitatively similar to those found in other studies [Michelangeli et al., 1995; Cassou, 2008].
 Equivalent clusters diagnosed in the low resolution T159 model configuration are shown in Figures 1e–1h, again shown in order of their climatological frequency of occurrence. The climatological frequencies of occurrence of the T159 clusters are notably different from ERA, with more homogeneity in the T159 frequencies. The T159 cluster partition has a significance of 84.4%. This indicates that the T159 model has a more gaussian phase space pdf, and hence does not simulate the same type of regime behavior as seen in ERA. The low significance prevents the application of the regime interpretation of the clusters to the T159 results, hence the resulting patterns are referred to only as clusters.
 In addition, the spatial patterns of the T159 cluster centroids are rather different to those in ERA. The first cluster (Figure 1e) is consistent with the spatial pattern of the NAO+ cluster in ERA, although shifted to the North. The second cluster (Figure 1f) is very similar to the NAO− cluster in ERA, although with a southward shift to the centre of the positive anomaly. Although these patterns are both shifted, there is a reasonable physical basis for their interpretation as NAO+ and NAO−. This is not the case for the remaining clusters, which are difficult to classify in terms of the ERA clusters. Since these clusters do not bear any particular physical resemblance to any of the clusters in ERA they are simply referred to as Cluster 3 and Cluster 4 respectively.
 We define an error metric ϵ for each model cluster centroid as the length of the vector between the model centroid and the corresponding ERA centroid in a common phase space. The errors for the NAO+ and NAO− clusters in the T159 configuration are given in the first row of Table 1. The errors for Cluster 3 and Cluster 4 are not shown since we cannot objectively choose a corresponding ERA cluster. The third row of Table 1 shows ϵ (relative to the ERA data set) for each cluster computed from the NCEP 20th Century Reanalysis [Compo et al., 2011] ensemble mean data set. This provides a measure of the error one might expect to see simply due to differences between reanalysis systems. It is evident that ϵNAO+ and ϵNAO− in the T159 model are two orders of magnitude larger than that between two reanalysis data sets.
The mean error for the T159 model configuration is the minimum possible error found by matching all permutations of ERA clusters with the T159 clusters.
 We can include all four T159 clusters in our assessment of error by using the mean error over all four partitions as an error metric. We then compute for all permutations of matching ERA clusters to T159 model clusters. Through this process the minimum possible in T159 is determined, although we note that the particular permutation of matching ERA clusters to model clusters associated with the minimized neglects the physical consistencies between the NAO+ and NAO− clusters in the T159 model and ERA. The minimum possible error is also two orders of magnitude larger than the error between two reanalysis systems.
 The clusters diagnosed from the high resolution T1279 model configuration are shown in Figures 1i–1l. The T1279 cluster partition has a significance of 98.6%, which is comparable to ERA. This suggests that like ERA, but unlike the T159 model configuration, the T1279 model configuration has a significantly non-gaussian phase space pdf, and that the regime interpretation of the clusters is valid. The spatial patterns of the T1279 clusters are very similar to observations, and one can easily see a one-to-one relationship between the model clusters and those from ERA. The error statistics for the T1279 clusters are given in the second row ofTable 1. The errors in the T1279 clusters are almost two orders of magnitude smaller than the T159 errors, and remarkably are comparable to the errors found between the clusters produced from two different reanalysis data sets.
 The T1279 model configuration produces clusters in the same order of climatological frequency of occurrence as ERA. However, the T1279 model configuration does show an excess of the NAO+ and AR regimes compared to ERA, and a deficiency of around 2% in the climatological occurrence of the BL regime.
 The high significance and similarity of the spatial patterns of T1279 cluster centroids to those in ERA, coupled with the reasonable representation of climatological frequencies shows that the T1279 model configuration is capturing the regime behavior seen in ERA, whereas the T159 resolution configuration is not.
 Whilst having realistic spatial patterns and a high level of significance is an essential part of a realistic simulation of regime behavior, it is also necessary for a model to simulate the temporal characteristics of the regimes realistically in order to be considered a “good” model. Figure 2 shows the distribution of persistence (in days) of each regime in ERA and both model configurations. Persistence is represented as a number times the given cluster persists for the given number of days. Since the spatial patterns of Cluster 3 and Cluster 4 in the T159 model configurations are different to any in ERA or the T1279 configuration, we cannot make direct comparisons of the persistence of these clusters, and therefore persistence is not shown for these clusters. However, it is clear that the T159 model makes too many short visits to the NAO+ and NAO− clusters (Figures 2a and 2d). This is consistent with the lower significance of the clusters in the low resolution configuration. The T1279 model configuration shows persistence similar to ERA for the NAO+, AR, and NAO− regimes. However, the T1279 configuration seems to have too many shorter visits to the BL regime, and a deficiency of longer visits. The number of short (up to 8 day) events in T1279 is 247, compared to 220 in ERA. This over-representation of short events is evident inFigure 2b, where the T1279 curve lies above the ERA curve on the left side of the dashed line marking persistence of 8 days. From Figure 2bit is clear that there is some under-representation of longer blocking events in the 8–13 day range, there then appears to be somewhat of a recovery in the 13–17 day range. The total number of long (greater than 8 day) events in T1279 is 41, compared to 54 in ERA. This indicates that there is still a deficit in longer blocking events in T1279, and that the apparent recovery in the 13–17 day range is likely undermined by both the deficit in 8–13 day events and a deficit in events lasting longer than 17 days. It is conceivable that resulting deficiencies could be related to systematic biases in the representation of tropical intraseasonal variability in the T1279 model configuration [Jung et al., 2012]. Note that the sum of short and long events does not yield the climatological frequency since an event that consists of multiple days is counted only once whereas the climatological frequency counts all days.
 Another important aspect of a regime simulation is the preferred transitions between regimes. To this end we explore the probabilities of transitioning from a given regime into each of the other regimes. Table 2 shows these transition probabilities for ERA. Corresponding probabilities for the T1279 model configurations are given in Table 3. In general the T1279 model configuration reproduces the preferred transitions from ERA, with most probabilities within 0.05. However, the model overly prefers transitions from the BL regime to the NAO+ regime over transitions to the NAO− regime, relative to observations.
 Using a clustering technique, four clusters are identified in the wintertime geopotential height field in ERA. These clusters are highly significant and are interpreted as circulation regimes in the European/Atlantic sector. The same analysis is also applied to high and low horizontal resolution configurations of an atmospheric model. The high resolution model configuration has clusters similar to those in ERA, with a high level of significance. The temporal characteristics of the clusters are similar to ERA, with the exception of the blocking cluster, which appears to be visited for more short periods and fewer longer periods than in ERA. This result may have relevance to the diagnosis of blocking frequency using standard techniques [Stan and Straus, 2007]. The low resolution model configuration does not have a realistic representation of regimes in the European/Atlantic sector. This configuration identifies the positive and negative phases of the NAO as clusters, but the significance of the clusters is low. This indicates that the low resolution model phase space pdf follows a more multi-normal distribution. The temporal characteristics of the NAO clusters show an over-representation of short visits to the NAO+ and NAO− clusters, which is consistent with the lower overall significance. It is likely that the superior performance of the T1279 model configuration results from more realistic orography [Jung et al., 2012] and also from more realistic representation of nonlinear Rossby wave breaking processes, which are known to be important in maintaining persistent anomalies [Woollings et al., 2008; Masato et al., 2012].
 Understanding gained from studies of low-dimensional dynamical systems suggests that the response to external forcing of a system with regimes is manifested primarily in changes to the frequency of occurrence of those regimes. This implies that a realistic simulation of regimes should be an important requirement from climate models. We have shown that a low resolution atmospheric model, with horizontal resolution typical of CMIP5 models, is not capable of simulating the statistically significant regimes seen in reanalysis, yet a higher resolution configuration of the same model simulates regimes realistically. This result suggests that current projections of regional climate change may be questionable. This finding is also highly relevant to regional climate modelling studies where lower resolution global atmospheric models are often used as the driving model for high resolution regional models. If these lower resolution driving models do not have enough resolution to realistically simulate regimes, then boundary conditions provided to the regional climate model could be systematically erroneous. It is therefore likely that the embedded regional model may represent an unrealistic realization of regional climate and variability. This study has shown a large improvement in the model simulation of regimes, corresponding to a large increase in horizontal resolution. It is not known, however, if there is a particular resolution threshold, above which this regime behavior can be simulated realistically, or if gradual improvements are to be expected from increased resolution.
 The models studied here used observed SSTs for boundary conditions. However, the coupled atmosphere–ocean models typically used for climate prediction have an interactive ocean model, complete with its own errors and biases. It seems unlikely that one would see such a large improvement moving from T159 to T1279 in a coupled scenario simply due to errors in the ocean model and the two-way interactions between the atmospheric and oceanic model components. The coupling process often involves a certain amount of model parameter tuning, which may also decrease the impact of improved atmospheric resolution noted here. However, there is evidence that with modest improvements to oceanic resolution one can reduce some of the large SST biases that affect global circulation [Scaife et al., 2011], suggesting improved atmospheric resolution may still provide considerable benefits.
 The technique applied in this study is a subtle but powerful method to understand variability in atmospheric models. The performance differences between the two models, and the associated implications, are large when assessed using this method. This is in contrast to the more modest differences one sees when more standard techniques (e.g., based on first two moments of models' pdfs) are applied [Jung et al., 2012]. The regime framework has the potential to allow us to better discriminate between “good” and “bad” models by going beyond standard diagnostics that implicitly assume a gaussian nature to variability, and attempting to understand more complex forms of variability.
 This work was funded by the Natural Environment Research Council TEMPEST (Testing and Evaluating Model Predictions of European Storms) project. Thanks to Jim Kinter, Thomas Jung and colleagues for their hard work on the Athena Project which enabled this research. We are also grateful to David Straus for helpful discussions regarding this work. The 20th Century Reanalysis V2 data were provided by the NOAA/OAR/ESRL PSD, Boulder, Colorado, USA, from their Web site at http://www.esrl.noaa.gov/psd/. We thank two anonymous reviewers whose comments improved this paper.
 The Editor thanks the two anonymous reviewers for their assistance in evaluating this paper.