The problems one encounters in estimating trends in climate temperature are well known. Specifically, climate researchers have become increasingly aware of problems caused by structural change (i.e. shifts and breaks), serial correlation and unit roots. The consequences of ignoring these problems include the danger of reporting a statistically significant trend in the data when, in fact, it is the spurious result of using a standard test with inflated size. Hence, more powerful tests that allow structural change and are robust to the presence of serial correlation and unit roots are called for. Some of the first efforts to address these issues directly are the papers by Beran (1992), Bloomfield (1992), Bloomfield and Nychka (1992), and Woodward and Grey (1993) who first illustrated the large uncertainty (standard error) accompanying these problems of estimation. As progress in the statistical analysis of trends advanced, so too have the methods used to estimate trends in temperature.

Fomby and Vogelsang (2002) applied to climate temperature data a linear regression model developed in Vogelsang (1998) that is robust to both serial correlation and unit roots. Further advances were provided in Perron and Yabu (2009a). Among to first to use fractional differencing models to detect the presence of long memory in climate data were Bloomfield (1992), Bloomfield and Nychka (1992), and Smith (1993). Mills (2007) applied a parametric ARFIMA model, Baillie and Chung (2002) applied a multivariate AFRIMA model, and Gil-Alana (2003) applied a semiparametric frequency domain model. The problem of structural breaks was recognized and incorporated in trend estimation by Seidel and Lanzante (2004), Wu and Zhao (2007), Gil-Alana (2008a, 2008b), and Perron and Yabu (2009b).

The purpose of this note is to present and demonstrate the application of recently developed econometric methods of trend estimation using the HadCRUT3 global, and northern and southern hemisphere temperature data updated through 2009 as the illustrative example.

2. Prior results

Using several prominent temperature datasets, Fomby and Vogelsang (2002) estimated the global trend to be + 0.5 degrees Celsius per 100 years in data from the mid-1800s to 1998. Using the HadCRUT dataset, Jones and Moberg (2003) found that the global warming trend is not continuous, but occurs over two periods: 1920–1975 and from 1975 on (through 2001), implying a break in 1975. Seidel and Lanzante (2004) incorporated breaks in a piecewise linear model of global atmospheric temperature changes from 1900–2002. They found that most of the warming during the 1959–2001 period occurred at the time of an abrupt climate shift in 1975. Dai and Wang (2010) used several datasets for the periods 1958–2001 and 1979–2007, and found warming trends for the northern and southern hemispheres, generally increasing as the data grid moved toward the poles. Warming in the northern hemisphere was generally more intense. These studies are representative of the larger body of work that suggests there is indeed a recent warming trend in global temperatures.

3. Data, methodology, and new results

I analyse the HadCRUT3 global and hemispheric surface temperature data described in Brohan, et al. (2006) updated through 2009. Graphs of these data are presented in Figure 1. The time period of my analysis is 1850–2009, or 160 years of annual data. I analyze the global (HadCRUT3G), northern (HadCRUT3N) and southern (HadCRUT3S) hemisphere series separately. Following the recommendation of Brohan, et al. (2006), I use the unadjusted HadCRUT3 data (rather than the HadCRUT3v data). In this section, I will present the results of several statistical tests. The derivation of each of these tests is rather lengthy and complex. In order to save space, I refer the reader to the published articles for complete details.

It is well known that structural breaks and shifts can weaken or even invalidate the results of statistical tests that fail to recognize them. In econometrics, the phrase ‘structural break’ (or ‘structural change’) refers to a systematic change like a war, depression, recession, or oil price shock. In climatology it would refer to things like a major change in industrial production and possibly other human activity that increases the amount of carbon and other ‘atmospheric pollutants’ (as the US EPA has recently decreed). Thus, as a prelude to other statistical tests, I first test for the existence of a single structural shift in the trend of the climate temperature data using the test of Perron and Yabu (2009b). This test builds upon the Feasible Quasi-Generalized Least Squares procedure developed in Perron and Yabu (2009a discussed below) and is robust to the presence of stationary or integrated noise components, where integrated denotes the degree of differencing required to achieve stationarity (here meaning having a constant mean and variance over time). The shift date is estimated by minimizing the sum of squared errors from a time series regression containing level and slope shift dummies. Using this test, a shift date of 1901 for HadCRUT3G, 1970 for HadCRUT3N and 1901 for HadCRUT3S was suggested.

3.2. Unit root tests and fractional integration

Continuing my statistical analysis, I now apply unit root tests and estimates of fractional integration. I use the augmented Dickey-Fuller (ADF) test of Dickey and Fuller (1979) and Said and Dickey (1984) to test the null hypothesis that a time series y_{t} contains a unit root after allowing autocorrelation in the time series regression errors with j-lags of the changes in y_{t}:

(1)

In my analysis, the maximum number of lags allowed is 9, chosen by the Akaike (1974) information criterion (AIC). The AIC is a measure of the goodness-of-fit of a statistical model. A time trend can also be added if desired, but I do not. Including a time trend assumes there is a non-zero trend in the data and, if not appropriate, significantly lowers the power of the test. The ADF test is a t-test on (α_{1}) the coefficient for the lag of y. A significant ADF t-test indicates that y_{t} is stationary (i.e. rejects the presence of a unit root).

If a time series contains a unit root, then it is non-stationary, non-mean reverting, and essentially non-predictable using only past values of the series itself. However, it is now well known that structural change can spuriously cause the appearance of a unit root. For this reason, I also apply the LM (Lagrange Multiplier) unit root t-test of Lee and Strazicich (2003, 2004), that allows a single endogenous (unknown) break. I apply their Model C, the most general model, that allows a single break in both the test regression intercept and slope at the same point in time. Lee and Strazicich note that failure to allow for a structural break under the null (as in the test of Zivot and Andrews (1992)) can distort the size of the unit root t-test. Their LM unit root test avoids this problem by allowing a structural break under both the unit root null and the alternative. I also apply the unit root test of Perron (1989), and Kim and Perron (2009) that allows a break at a known (estimated) point in time. I note that the Kim and Perron (2009) estimated break dates in Table I are nearly identical to those estimated by using the method of Perron and Yabu (2009b) above.

Table I. Unit root tests, 1850–2009

DF-ADF

L&S-LM (Unknown break)

K&P (Known break)

Note:

^{a}

Denotes reject H(0): unit root at 0.05 level.

DF-ADF test includes constant only; L&S-LM and K&P tests allow break in constant and trend. Max number of lags allowed = 9. Known breakdate in ().

HadCrut3G

− 0.43

− 3.70

− 5.64* (1901)

HadCrut3N

− 0.65

− 4.01

− 5.48* (1972)

HadCrut3S

− 0.12

− 4.46

− 6.18* (1900)

In Table I we see that the results are mixed. Both the Dickey-Fuller tests (with no break) and the Lee-Strazicich tests (with a single unknown break) fail to reject the presence of a unit root in the HadCRUT3 data. However, the tests of Kim and Perron (2009) using known break dates reject the presence of a unit root. As a result of these mixed results, I also estimate fractional integration of the temperature data.

As shown in a number of previous studies (e.g. Kramer (1998)), the classic DF-ADF unit root test is not consistent against fractional alternatives, except in special cases. Fractional integration models allow one to test for the presence of non-integer orders of integration greater or less than 1.0, denoted I(d) meaning ‘integrated of order d.’ I present the results using the feasible exact local Whittle estimate (FELW) of Shimotsu and Phillips (2005) and Shimotsu (2010). The FELW estimator is a modification of the exact local Whittle estimator of Shimotsu and Phillips (2005) to allow an unknown mean and polynomial time trend, is consistent for , and has a limit distribution for [ with a polynomial trend]. The FELW method allows a larger range of consistent and normally distributed values and has a smaller standard error than several other methods of estimating d.

Diebold and Inoue (2001), and Granger and Hyung (2004), among others show that structural change and I(d) models can be easily confused. Thus I will also include a model of fractional integration that explicitly allows structural change. Smith (2005) presents a frequency domain estimate of long memory, the GPH or Geweke Porter-Hudak (1983) estimate, that allows slowly varying level shifts. Specifically, Smith shows that the standard GPH estimate is biased in the presence of level shifts. Using common notation, if we now let I(ω_{j}) denote the sample periodogram at the j-th Fourier frequency, evaluated at ω_{j} = 2π/T, j = 1, 2, …, m (m = the bandwidth parameter) and T = total number of observations, then the original Geweke and Porter-Hudak (1983) GPH estimator of d is based on the OLS regression of the log-periodogram on the log frequency:

(2)

where d = − β_{1}.

Smith argues that adding an additional regressor to the GPH log-periodogram regression, − log(p^{2} + ω_{j}^{2}) where p is estimated as p_{T} = kj/T for some constant k > 0, reduces the bias caused by level shifts. Smith recommends setting k = 3. He then shows that this specification of the GPH regression model significantly reduces the bias, although with somewhat less precision. Smith suggests that the loss in precision is offset by the reduction in bias. He calls this new model the Modified GPH estimator.

Table II presents the results of estimating fractional d using the feasible exact local Whittle (FELW) and the Modified GPH. Again we see mixed results. The FELW estimates suggest nonstationary mean reversion (0.5≤d < 1.0) in the case of HadCRUT3G and HadCRUT3S, and unit root in the case of HadCRUT3N. The Modified GPH estimates suggest stationary mean reversion in the case of HadCRUT3N and HadCRUT3S (d < 0.5) and nonstationary mean reversion in the case of HadCRUT3G. However all three 95% confidence intervals contain zero. We also see that the FELW estimates are uniformly higher than the Modified GPH estimates. This suggests that the well-known effect of neglecting to allow structural change has indeed biased the FELW estimates toward nonstationarity.

Table II. Fractional integration estimates, 1850–2009

In this section, I will present several tests for the existence of a trend in the global and hemispheric climate temperature data. All tests for the significance of the temperature trend (t) discussed in this section refer to the standard OLS linear regression model:

(3)

that is subsequently modified as discussed in the articles developing the respective tests.

In Table III, I present the results of the trend tests of Bunzel and Vogelsang (2005). Vogelsang (1998) presented a size robust trend analysis methodology that is valid in the presence of general forms of serial correlation and a unit root in the errors of the trend function. As noted earlier, Fomby and Vogelsang (2002) used this methodology on several major global climate temperature datasets. Bunzel and Vogelsang (2005) improved upon this test using the fixed-b asymptotic framework developed in Kiefer and Vogelsang (2005). Bunzel and Vogelsang (2005) present two new forms of the original t-PS test of Vogelsang (1998) for the significance of the trend estimate from Vogelsang (1998): the Dan-J test for stationary errors and the Dan-BG test for unit root errors. I note that in these tests, there is no allowance for structural shifts.

Table III. Bunzel-Vogelsang (2005) trend tests, 1850–2009

Trend coefficient

t-test

95% Upper

Error model

Note: All t-tests fail to reject H(0): β_{2} ≤ 0. 95% C.V. for tDan-BG = 2.391.

HadCrut3G

0.0044

0.725

0.0189

t-Dan-BG

HadCrut3N

0.0045

1.017

0.0150

t-Dan-BG

HadCrut3S

0.0043

0.682

0.0195

t-Dan-BG

The tests in Table III are one-sided tests for β_{2}≤0, so only the 95% upper bound is included. These tests to not apply prewhitening of the regression errors, and unit root tests on the regression errors indicate that the Dan-BG form of the significance test is appropriate for all three temperature series. All tests fail to reject the null at the 0.05 level of significance.

I also apply the test of Perron and Yabu (2009a) for the slope of the trend function of the standard model when it is a priori unknown whether the time series is trend stationary or contains a unit root. Using what they call a Feasible Quasi-generalized Least Squares method, Perron and Yabu are able to show that inference on the slope parameter can be performed using the standard Normal distribution. The Feasible Quasi-generalized Least Squares method used by Perron and Yabu is the classic Cochran-Orcutt procedure with an estimate of the sum of the autoregressive coefficients truncated to take a value of 1.0 when the usual estimate is in the neighborhood of 1.0. Hence it is super-efficient when the true value is 1.0, and the limiting distribution of the test does not depend on whether the regression error is strictly I(0), covariance stationary, or I(1) unit root. Its distribution is thus standard Normal, and the two-sided 95% critical value is thus ± 1.96. Perron and Yabu suggest that their test has better size and power properties than the previous tests of Bunzel and Vogelsang (2005), and Harvey et al., (2007).

In Table IV, I present the results of these tests. I note that these are two-sided tests for β_{2} = 0, so both the 95% lower and upper bound are included. Table IV suggests that all tests fail to reject the null at the 0.05 level.

Table IV. Perron and Yabu (2009a) trend tests, 1850–2009

Trend coefficient

t-test

95% Lower

95% Upper

Note: All t-tests fail to reject H(0): β_{2} = 0. 95% C.V. = ± 1.96.

HadCrut3G

0.0055

1.535

− 0.0015

0.0126

HadCrut3N

0.0057

1.760

− 0.0006

0.0120

HadCrut3S

0.0054

1.823

− 0.0004

0.0112

So far, using the complete data for 1850–2009, I am unable to reject a null hypothesis of a flat (zero-beta) slope for the temperature series for the global, northern and southern hemispheres. However, that is not the end of the story. I noted above that a failure to account for the presence of a structural break can severely bias the results of statistical tests. Therefore, I will now present two variations of such a test.

3.5. Trend tests allowing a single trend shift

I noted above that the trend shift test of Perron and Yabu (2009b) suggests shifts in 1901 for HadCRUT3G and HadCRUT3S, and 1970 for HadCRUT3N. In Table V I present pre- and post-break tests for these dates using the trend test of Perron and Yabu (2009a). One can also allow two or more breaks or shifts. However, with only 160 data points, allowing two or more breaks presents the possibility of segments of 50 time periods or less which would significantly increase the sampling error of the trend estimates. In Table V we see that in all three cases the pre-break slopes cannot reject the null hypothesis of β_{2} = 0, while the post-break tests all show a rejection of the null at the 0.05 level.

Table V. Perron-Yabu (2009a) trend tests, 1850–2009

Break date

Pre-break coefficient

t-test

95% Lower

95% Upper

Post-break coefficient

t-test

95% Lower

95% Upper

Note:

*

denotes reject H:(0): β_{2} = 0. 95% C.V. = ± 1.96. We use Model 3 which allows break in constant and trend.

As noted in Section 2 above, prior research has suggested a major shift in climate temperature in 1975. For this reason, I also replicate Table V using a break date of 1975 in Table VI. In Table VI we see that the results are very similar to those in Table V, with one exception. In addition to suggesting the rejection of the zero-slope null at the 0.05 level for all the post-break data, there is also a rejection for the pre-break HadCRUT3S data. Thus, regardless of whether I use a break date of 1901 or the 1970s, my analysis consistently suggests a post-break date warming trend.

Table VI. Perron-Yabu (2009a) trend tests for 1975 break date, 1850–2009

Break Date

Pre-break coefficient

t-test

95% Lower

95% Upper

Post-break coefficient

t-test

95% Lower

95% Upper

Note:

*

denotes reject H:(0): β_{2} = 0. 95% C.V. = ± 1.96. We use Model 3 which allows break in constant and trend.

In this note, I have presented and illustrated recently developed econometric trend tests using the HadCRUT3 global and hemispheric surface temperature data updated through 2009, specifically allowing statistical complications of structural change, serial correlation and unit roots. It is hoped that climate researchers will find these methods useful in their own research on climate change. My results confirm the general finding of earlier studies: the HadCRUT data present a consistent pattern of warming in recent years (post-1975). Notably absent from this univariate analysis is any attempt to consider the influence of possible causal variables discussed in the climatologic literature, or to extrapolate the estimated trends into the future. A short note such as this cannot possibly do justice to this rich and varied literature. This is clearly a topic for future research.

Acknowledgements

The author thanks the Editor and two anonymous reviewers for helpful and constructive comments. Gauss and Matlab computer code for some of the statistical calculations was graciously provided by Junsoo Lee, Pierre Perron, Katsumi Shimotsu, Aaron Smith and Tim Vogelsang. The usual disclaimer applies.