Previous statistical detection methods based partially on climate model simulations indicate that, globally, the observed warming lies very probably outside the natural variations. We use a more simple approach to assess recent warming at different spatial scales without making explicit use of climate simulations. It considers the likelihood that the observed recent clustering of warm record-breaking mean temperatures at global, regional and local scales may occur by chance in a stationary climate. Under two statistical null-hypotheses, autoregressive and long-memory, this probability turns to be very low: for the global records lower than p = 0.001, and even lower for some regional records. The picture for the individual long station records is not as clear, as the number of recent record years is not as large as for the spatially averaged temperatures.
 Global mean surface air temperature shows a positive trend in the 20th century [Brohan et al., 2006; Hansen et al., 2006; Smith and Reynolds, 2005]. After an initial “flat” development at a pre-industrial level, a clear warming took place from the 1910s to the 1940s, leveling off in the 1970s. Since 1980 temperatures show a positive trend of about 0.18°C per decade [Trenberth et al., 2007]. This temperature increase and its geographical patterns have been subject to detection studies [Hegerl et al., 1996; Barnett et al., 2005], which are partially based on sophisticated statistical analyses of observations and climate simulations. Here, we pursue a simpler and intuitive idea, more accessible to the non-expert, to estimate the likelihood that the recent observed warming is consistent with the natural variability.
 The increasing anthropogenic greenhouse forcing would, according to simple physical reasoning [Arrhenius, 1896], cause a clustering of record warm years at the end of the observed record, and in fact, the 13 warmest years in 1880–2006 have all occurred in or after 1990. A clustering of record years at the beginning of the record would be in contradiction to anthropogenic greenhouse forcing. The probability of this clustering occurring by chance at the end of the record can be estimated under different null hypotheses of the statistical characteristics of natural variability. The most simple of those is that the annual values of mean global temperatures are independent of one another. In this case, the probability p of the event E of finding at least 13 of the largest values of a sequence of 127 independent random numbers on the last 17 places (year 1990 to 2006) is p(E) = (114!17!)/(127!4!) = 1.25 × 10−14. Such clustering appears as an extremely improbable random event in a stationary climate. However, the annual surface air-temperatures displays a serial correlation, even in a stationary natural climate, due to processes occurring on the land surface, ocean, and cryosphere. We conceptualize this natural memory with two statistical models, namely “short term” and “long term” memory. Within the former, annual mean temperature is assumed to be the result of an autoregressive process, which displays an exponentially decaying auto-covariance function. In the latter, it is described by a long-term persistence process with a power-law decay of the auto-covariance function. Some processes that may be relevant in the climate context have been shown to display such type of behavior [Bunde et al., 2005; Rybski et al., 2006]. Although it is quite difficult to ascertain whether a short time series such as the observed global mean annual temperature obeys this type of behavior, the present approach can be readily applied to other more complex null models. It also offers the advantage that it can be extended to regional spatial scales using some of the longest individual station records.
2. Data and Methods
 We have analyzed global, regional and long station temperature records: global annual mean temperature in the period 1880–2006 from three global temperature data sets - Hadley Centre-CRU [Brohan et al., 2006], NASA GISS [Hansen et al., 2006] and NOAA NCDC [Smith and Reynolds, 2005]; the regional annual temperature means in the period 1850–2006 from a spatial average of those grid-cells of the Hadley-Centre-CRU data in 26 geographical regions [Giorgi and Bi, 2005]; and eleven long station records constructed by blending original time series kindly provided by Jones and Moberg , and which end in year 2000, with the station data from year 2001 onwards provided by NASA-GISS. Details are given in the auxiliary material.
 The parameters of the autoregressive models were estimated from the observed records in the period up to 1960 to limit the influence of the anthropogenic forcing. For all but two regional records (South Australia and Southern South America) an autoregressive model of order one (AR-1) would be adequate. For these two regional records, the Durbin-Watson test indicated the presence of autocorrelated residuals in a AR-1 processes. However, this may be due to chance, as they appear in 2 out of 26 regional records. Estimation of confidence intervals for lag-1 autocorrelation is based on bootstrap methods [Effron and Tibshirani, 1993]; details are given in the auxiliary material.
 The second null-hypothesis is that the temperature records are realizations of a ‘long-range autocorrelation’ process [Cohn and Lins, 2005]. The power-law decay of the autocorrelation is characterized by C(k) ∼ k−γ where k is the time lag. The value of γ is related to the fractional differencing parameter d by γ = 1 − 2d. For the process to be stationary d must lie between 0 and 0.5. The Whittle method [Shimotsu and Phillips, 2005] was used here to estimate its value. Different statistical tests of the stationarity of the global mean temperature have yielded conflicting results [Stern and Kauffman, 2000]. The Whittle method gives values slightly larger than 0.5, even disregarding the period from 1960 onwards, for all three global records. Proxy-based reconstructions of the Northern Hemisphere annual temperature in the past millennium, with variations less affected by anthropogenic greenhouse forcing than in the 20th century, yield values for d in the range 0.32 to 0.54 [Rybski et al., 2006]. Considering these possible uncertainties, it will be assumed here that d is smaller, but very close, to 0.5. It is noted that other more complex null models are possible, e.g., where both types of short-term and long-term persistence are present [Cohn and Lins, 2005]. The simultaneous estimation of both parameters becomes, however, more problematic. This will be pursued in further analyses.
3. Global Mean Temperature
 We focus on the number of warm record years in the last 17 years of the record, as illustrated by Trenberth et al. . However, this choice is not critical, as the conclusions remain robust when considering the last 10 to last 20 years of the record. Figure 1 shows the probability p, as a function of the lag-1 autocorrelation α or of the fractional differencing parameter d, that the clustering of high values fulfills the criterion that at least m of the largest values occupy the last 17 places. Log(p) under the autoregressive null-hypothesis exhibits a remarkably linear dependence on α up to very high values. It is not straightforward to determine which value of α would better represent the natural persistence of the global temperatures, i.e., without the effect of anthropogenic forcing. The last decades of the 20st century are probably too strongly affected. Temperature records in the decade 1940–1950 have been found to be distorted by changing in the measuring devices of sea-surface temperatures [Thompson et al., 2008], and temperature data in the late 19th century are burdened by higher uncertainties [Brohan et al., 2006]. To be conservative, i.e., risking an overestimation, a value of α in the range of 0.75 to 0.85 (auxiliary material) can be assumed. Within this range the probability of a random occurrence of event E is about 10−5 to 10−3, a higher likelihood than under the null-hypothesis of white noise, but still an extremely rare event. The dependence of log(p) on the value of the fractional differencing parameter d is not so steep in the range of 0 to 0.45, but p(E) is still quite low for values of d close to 0.5, yielding a probability for event E of about 10−3 as well. The results for the last n = 17 years can be considered typical. Figure 1 includes the range of probabilities obtained when considering the range from n = 10 to n = 20, for the cases α = 0.85 and d = 0.45 (auxiliary material).
4. Regional Temperature Records
 Series of record warm years can be also found in regional temperatures. Here, we consider broad geographical regions as defined previously [Giorgi and Bi, 2005] and discussed by Trenberth et al. . The number of warm record years in the last 17 years is smaller in the regional series than in the global mean because amplitudes of natural variability at regional scales are larger. However, the persistence of the regional temperature series is also smaller (smaller α and d), and therefore the likelihood of the clustering of a smaller number of record warm years could still be theoretically smaller than for the global mean temperature, thus making such events even more rare than for the global mean temperature.
 Some statistical characteristics of the regional records are given in the auxiliary material. Figure 2 displays the value of the lag-1 autoregressive parameter α and the log-probabilities of event E under the autoregressive null-hypothesis (numerical values are enclosed in the auxiliary material). The log-probabilities of event E depend on α, on the number of missing values present in the observed records, and on the number of warm record years observed in the last 17 years. The value of α shows a slight but perceptible meridional gradient with lower values at high latitudes in the Northern hemisphere and higher values in the regions adjacent to the Southern Ocean. The probabilities of event E occurring by chance under the autoregressive null-hypothesis, on the other hand, do not show a clear geographical pattern. For many of the 26 regions the likelihood of event E under the autoregressive null-hypothesis is quite low, especially in Western Europe, Africa and Central Asia, in the range 10−5 to 10−4, which is even lower than for the global mean temperature.
 The estimated fractional differencing parameter d for the regional records also shows a clear geographical pattern, with a tendency for higher values in the Southern Hemisphere (Figure 3). The interaction of this gradient with the geographical variation of the number of record-years in the last 17 years yields no clear pattern. The probability of event E occurring by chance under the long-memory null-hypothesis is in general higher than in the autoregressive case. Nevertheless, in several regions in Africa and Eurasia, this level of significance lies between 10−2 and 10−3.
5. Individual Long Station Records
 The individual stations are geographically clustered in Europe. The results are summarized in Figure 4 and in the auxiliary material. Due to the local temperature noise the values of α are lower than for the regional or global mean but the number of recent record years is also not as large. Both factors tend to balance, yielding probabilities in the same range as for the regional records (Figure 4b). In general, the value of d for the stations and regional records lies within the range 0 to 0.3 (Figure 4c), in accordance to independent estimations [Bunde et al., 2005]. The results for the long-term-persistence null-hypothesis are qualitatively similar to the autoregressive case (Figure 4d).
 Our assessment of the rarity of the event E depends on the underlying null hypothesis. The risk of erroneously rejecting the null hypothesis strongly depends on the assumption about the character of the memory, as is demonstrated for long-term trends by Cohn and Lins  and for the occurrence of extremes by Bunde et al. . This significance can change by orders of magnitude. But, nevertheless, the clustering of warm years at the end of the observed record would appear to be as an extremely rare event under either of the two assumptions about the memory that are considered here. For the global mean temperature, conservative values of the lag-1 autocorrelation for global temperature yield very low likelihoods for the recent clustering of record warmth and under the long-term-persistence model this probability is even lower. The analysis of the regional records indicates that for some this likelihood is lower than for the global mean. This may seem surprising, as global records should inherently contain less internal noise and thus a higher signal-to-noise ratio. Two factors are regionally at play: a smaller number of record years and a smaller statistical persistence, rendering lower likelihoods of random clustering. This conclusion also holds for the individual station records. For these, the number of recent record years and the persistence of the record tend to be smaller than for the spatially aggregated records. It should be noted that the estimation of α and d from the observed records could be arguably biased towards larger values (e.g., larger persistence) by the effects of anthropogenic greenhouse forcing, and therefore the estimated probabilities of occurrence of event E, under these two null hypotheses, could be also biased high.
 We thank Armin Bunde for his kind help with the long-term-persistence algorithms and A. Moberg and P. Jones for permission to use their data products. Funding by the Swiss National Science Foundation and by NCCR Climate are acknowledged.