• Open Access

Panel and multivariate methods for tests of trend equivalence in climate data series



This article is corrected by:

  1. Errata: Corrigendum Volume 12, Issue 4, 386–388, Article first published online: 7 October 2011


We explain panel and multivariate regressions for comparing trends in climate data sets. They impose minimal restrictions on the covariance matrix and can embed multiple linear comparisons, which is a convenience in applied work. We present applications comparing post-1979 modeled and observed temperature trends in the tropical lower- and mid-troposphere. Results are sensitive to the sample length. In data spanning 1979–1999, observed trends are not significantly different from zero or from model projections. In data spanning 1979–2009, the observed trends are significant in some cases but tend to differ significantly from modeled trends. Copyright © 2010 Royal Meteorological Society

1. Introduction

Many issues of interest in climate analysis involve comparisons of trends across different data sets. This note explains regression-based methods that yield asymptotically valid parameter variances and covariances while providing a flexible testing framework. Obtaining linear trend coefficients is easy using ordinary least squares (OLS). Obtaining unbiased estimates of the parameter variances and covariances (collectively referred to as the covariance matrix) is more challenging, because the regression residuals may be autocorrelated within each panel, and both heteroskedastic (unequal variance) and correlated across panels. Regressions that use sequenced groups of time series observations are referred to as panel estimators (Davidson and MacKinnon, 2002). They are convenient when panels are unbalanced, i.e. they do not all have the same numbers of observations, but they impose restrictions on the covariance matrix. A nonparametric method introduced by Vogelsang and Franses (2005) handles autocorrelation of unknown dimension; however, it is only applicable to balanced panels.

We explain both methods and the trade-offs between them. In Section 3, we apply them to a comparison of model temperature projections and observations in the tropical troposphere. We test trend significance as well as model-data equivalence. For discussions of the importance of modeling and climatological measurement issues related to the tropical atmosphere, see Karl et al. (2006), Santer et al. (2005, 2008), and Douglass et al. (2007).

2. Methods

2.1. Introduction: two-equation case

We assume that the data are stationary, though autocorrelated, upon detrending; in other words, ‘trend stationary.’ Suppose that there are two series of interest, y and y, where τ = 1, …, T. Trends are fitted using

equation image(1)


equation image(2)

A student's t test of slope equivalence is

equation image(3)

where ∧ denotes an OLS estimate, equation image (i = 1,2) denotes an autocorrelation-robust variance estimator for i, and cov(1, 2) is the estimated covariance between the trend terms.

Karl et al. (2006) drew attention to an apparent discrepancy between observed and model-generated temperature trends in the tropical atmosphere. Douglass et al. (2007) tested surface-matched differences (Supporting Information) using

equation image(4)

where 1 denotes the trend through model ensemble means, 2 denotes the trend through observations, and 1 is the estimated standard error of 1. The test (4) incorrectly treats the observations as deterministic and assumes the model observations are independent across time. Santer et al. (2008) instead used

equation image(5)

where ∼ denotes a least-squares estimate and ri denotes the first-order autoregressive (AR1) coefficient in series i. The ratio of AR1 terms is commonly referred to as an ‘effective degrees of freedom’ adjustment (Santer et al.2000). Instead of a series providing T-independent observations, it is said to provide only (1 − ri)T/(1 + ri) -independent observations. The resulting variance corresponds to an estimate obtained using an AR1 model, but is not equivalent to that derived from higher order autocorrelation models. In addition, it does not yield a correct 2cov(1, 2) term (Supporting Information), which was missing in both Equations (4) and (5). While detrended climate model projections may be uncorrelated with observations, the assumption of no covariance among trend coefficients implies that models have no low-frequency correspondence with observations in response to observed forcings, which seems overly pessimistic.

2.2. Panel regressions

Equation (3) can be obtained using a panel regression. Suppose that the dependent variable is the stacked vector (y1, y2)′, and we estimate the following equation:

equation image(6)

(1 1)′ denotes two stacked T-length vectors of ones. (0 1)′ denotes a vector of T zeros stacked on T ones. This is called an indicator or a ‘dummy variable,’ since it indicates (value = 1) if the dependent variable is y2. (ττ)′ denotes a 2T-length vector consisting of two T-length time trends and (0 τ)′ is (ττ)′ times (0 1)′. A test of 2 = 0 in Equation (6) can be shown to be equivalent to testing 1 = 2 (Kmenta 1986; Supporting Information). Hence, the t-statistic on 2 in Equation (6) yields the test score (3).

To generalize the framework further, suppose that we are comparing m model-generated series and o observational series, making the total number of series N = m + o. Each source i yields TiT nonmissing observations yiτ over the interval τ = 1, …, T. Define an indicator variable obsiτ = 0 if the record is model generated, and = 1 if it is from an observational series. Denote the ith vector as yi = [yi1, …, yiT]. Stack these vectors into a single NT × 1 vector y as follows:

equation image(7)

Stack the trend vector τ′ = [1, …, T]N times to get the NT × 1 panel trend vector

equation image(8)

The indicator, or the dummy, variables are likewise stacked to form

equation image(9)

where obsi is (obsi1, …,obsiT)′. The regression equation is then written as

equation image(10)

where e is an NT × 1 residual vector with typical element eiτ. Note that all the ‘data’ are on the left-hand side, and the right-hand side consists of dummy variables and trend terms.

When obsij = 0, dyiτ/dτ = 1 and when obsit = 1, dyiτ/dτ yields (1 + 2). Thus, a t-statistic on 1 will test whether the model trend is zero and a test of the linear restriction 1 + 2 = 0 indicates the significance of the observed slope. The t-statistic on 2 tests whether the trend on observations differs significantly from the trend in models.

Equation (10) can be extended further. Suppose that observations come from two different systems, such as satellites and weather balloons. Define two different indicator variables: d1, which is equal to 1 if an observation is from either system 1 or 2, and d2 that is equal to 1 only if the observation is from system 2. The regression equation becomes

equation image(11)

The estimated model trend is 1. The trend in observations from system 1 is 1 + 2 and from system 2 is 1 + 2 + 4. The t-statistic on 4 tests whether the trend in the second observation system differs from that in the first, and so forth.

Hypothesis testing requires a valid estimator of V(b), the covariance matrix of b. The general form is (Davidson and MacKinnon 2002)

equation image(12)

where X denotes the right-hand side variables in Equation (11) and Ω = E(ee′). Obtaining a valid estimate of Ω requires modeling the cross- and within-panel covariances. For a panel i with T observations, define a matrix Ai of AR weights using the panel-specific AR1 coefficient ρi:

equation image(13)

Then a model of Ω can be written as

equation image(14)

where equation image denotes the covariance between series i and j, Ii denotes an identity matrix with dimension T, and equation image denotes the variance of series i. There are N(N − 1)/2 covariances equation image in Equation (14) that need to be estimated, in addition to the variances and AR1 parameters. If some panels j are shorter than others (Tj < T), then the dimensions of the Ai matrices need to be adjusted accordingly. Some commercial statistical packages, such as STATA, can accommodate unbalanced data sets.

2.3. Higher order autocorrelations and multivariate trend models

Vogelsang and Franses (2005, herein VF05) derived two estimators for Ω that impose no parametric restrictions on the lag and correlation structure, as is done in Equation (14). Suppose that the N panels are used one at a time in Equation (1), yielding OLS trend estimates = 1, …, N. Take the N residual series u, …, uNτ and form the T × N matrix U = [u, …, uNτ]. VF05 derive two transformations of U that converge in probability to a scalar multiple of Ω. Of their two estimators, we focus on the equation image form, which has higher power and is slightly easier to compute. It is obtained as follows. Denote V = U′ and take the columns vj, for j = 1, …, T, each of length N. Define a vector equation image. Then, VF05 show that

equation image(15)

converges in probability to an unbiased estimate of Ω, regardless of the form of autocorrelation and other departures from the independence assumption. For testing purposes, linear restrictions on the slopes can be written in the matrix form R = 0 (Supporting Information). The VF05 test statistic is

equation image(16)

where η = Σ(t)2 and q is the number of restrictions, which in our examples is always equal to 1. Critical values for Equation (16) generated by Monte Carlo simulation are reported in VF05.

The VF05 approach improves on the panel method by providing robust trend variances and covariances regardless of the autocorrelation order and the structure of heteroskedasticity. However, it requires balanced panels, which can be a limitation in some cases.

The VF05 statistic, as with all test statistics, has improved size as the sample size increases. Rejection probabilities also increase as ρ→1. Monte Carlo simulations in VF05 show that for T = 100, when q = 1 and ρ> 0.8, just under 10% of equation image scores exceed the 95th percentile, indicating a tendency to over-reject a true null, although this is an improvement compared to earlier alternatives. Each panel in our full sample has well over 100 observations, but a high ρ value. Hence, VF05 scores that are close to the critical values may overstate significance.

3. Empirical application

3.1. Data

We used the same archive of climate model simulations as used by Santer et al. (2008). The available group now includes 57 runs from 23 models. Each source provides data for both the lower troposphere (LT) and mid-troposphere (MT). Each model uses prescribed forcing inputs up to the end of the twentieth century climate experiment (20C3M; Santer et al., 2005). Projections forward use the A1B emission scenario. Table I lists the models, the number of runs in each ensemble mean, and other details. We used four observational temperature series: two satellite-borne microwave sounding unit (MSU)-derived series and two balloon-borne radiosonde series. We use monthly data starting in 1979, covering the tropics from 20°N to 20°S. The MSU observations come from the University of Alabama-Huntsville (UAH; Spencer and Christy, 1990) and Remote Sensing Systems Inc. (RSS; Mears et al., 2003). The HadAT radiosonde series is an MSU-equivalent published on the Hadley Centre web site (http://hadobs.metoffice.com/hadat/msu_equivalents.html; Thorne et al., 2005). The Radiosonde Innovation Composite Homogenization (RICH) series is published by Haimberger et al. (2008) and is available at ftp://srvx6.img.univie.ac.at/pub/rich_gridded_2009.nc. We used the RICH-gridded data and MSU weights supplied by John Christy (personal communication) to construct MSU-equivalent series (see Supporting Information for details).

Table I. Summary of data series
PanelModel/obs nameExtra forcingsNo. of runsLT trend (SD)MT trend (SD)AR coeffs LT/MT
  • Each row refers to model ensemble mean (rows 1–23) or observational series (rows 24–27). All models forced with twentieth century greenhouse gases and direct sulfate effects. Rows 10, 11, 19, 22, and 23 also include indirect sulfate effects. ‘Extra forcings’ column indicates which models included other forcing: ozone depletion (O), solar changes (SO), land use (LU), and volcanic eruptions (V). NA: information not supplied to PCMDI. ‘No. of runs’ indicates the number of individual realizations in the ensemble mean. LT and MT trends based on linear regression allowing six AR terms. Standard errors in brackets. AR coeffs: the AR lags that were significant (p < 0.05) for LT/MT layers, respectively.

  • *

    Significant at 10% level.

  • **

    Significant at 5% level.

1BCCR BCM2.0O10.210**0.211**1,2/1
5CSIRO3.0 10.162*0.301,3/1,3
6CSIRO3.5 10.305**0.288**1,2,6/1,2,6
7GFDL2.0O, LU, SO, V10.229**0.225**1,6/1,6
8GFDL2.1O, LU, SO, V10.1880.1931/1,4,5
9GISS_AOM 20.1270.1231/1
10GISS_EHO, LU, SO, V60.277**0.261**1/1
11GISS_ERO, LU, SO, V50.258**0.230**1,3,4,6/1,4
12IAP_FGOALS1.0 30.273*0.259**1/1
13ECHAM4 10.290**0.270**1,4/1
14INMCM3.0SO, V10.185**0.186**1,4,6/1,6
15IPSL_CM4 10.203**0.202**1,3,6/1,3,6
16MIROC3.2_T106O, LU, SO, V10.1000.1021,6/1,6
17MIROC3.2_T42O, LU, SO, V30.280**0.284**1/1
21PCM_B06.57O, SO, V40.178*0.142**1,2,3/1,2
23HADGEM1O, LU, SO, V10.258**0.270**1/1
24UAH  0.0700.0401,2/1,2
25RSS  0.157**0.117*1,2/1,2
26HadAT  0.097*0.0201,2/1,2
27RICH  0.114**0.0721,2/1,2

Our data start in January 1979 and end in December 2009. Thus, we have N = 27 panels, each with 372 monthly observations. Figure 1 displays the (smoothed) MSU series and the mean of the PCM model runs for comparison.

Figure 1.

UAH (thin dashed) and RSS (thin solid) satellite series 1979:1 to 2008:9. Thick line: Model 21 ensemble mean. Series smoothed using Hodrick–Prescott filter with smoothing parameter λ = 200. Top: LT and bottom: MT

Douglass et al. (2007) and Santer et al. (2008) focused on trends from 1979 to about 1999, with some series extending a few years further. To compare with these results, we first look at data ending in 1999, and then extend the sample to 2009. Since our panels are balanced, we can generate results using both the VF05 and panel regression methods, but since the results are so similar, we report only the VF05 results for the shorter 1979–1999 sample.

Table I summarizes the data. The 1979–2009 trends in °C per decade are shown for the LT and MT levels, with accompanying standard errors, for all ensemble means and observational series. Each series was centered and the trend regression allowed for a six-lag AR process, denoted as AR6. Table I (final column) shows that in 17 of the 23 models and in all 4 observational series, autocorrelation at lags greater than one were observed in at least one atmospheric layer. Hence, an AR1 error specification is likely inadequate. Extended autocorrelation lags were also observed in the individual model runs.

All climate models were forced with twentieth century greenhouse gas and sulfate levels: other assumed forcings are listed in Table I.

3.2. Multivariate trend test results

We weighted each model by the number of runs in its ensemble to adjust for the effect of combining runs into an average, although our conclusions would be unchanged if we weighted each model equally.

Table II presents tests of trend significance for the observational series. On data ending in 1999, the VF05 test shows the four observational series are insignificant at both the LT and MT layers individually and averaged together (column ‘Obs’). By extending the data to 2009, the equation image score of combined significance at the LT layer rises from 12.50 to 76.66, thus attaining significance at 5%. All observed LT series are individually significant, except UAH which is significant at 10%. At the MT layer, extending the sample raises the combined equation image score from 5.06 to 23.77, which is significant at 10%. UAH and Hadley series are insignificant, RICH is marginal, and RSS is individually significant at 5%.

Table II. Trend significance tests using nonparametric covariance estimator on balanced panels and panel regression on unbalanced panels
 Tests of trend significance
  • VF method: Shown are Vogelsang and Franses (2005) equation image test scores. The 90% critical value is 20.14, 95% critical value is 41.53, and 99% critical value is 83.96. Panel method refers to panel regression results. Shown are the trend in °C per decade, the standard error of the trend, and the p value of a test of H0: trend = 0. See text for discussion of column groupings. Headings: Obs, average of all observational series; MSU, combined satellite record; UAH, University of Alabama-Huntsville; RSS, remote sensing systems; BAL, combined balloon (radiosonde) series; HAD, HadAT balloon series; RICH, Haimberger balloon series; Models, average of 23 ensemble means.

  • *

    Significant at 10% level.

  • **

    Significant at 5% level.

VF method        
1979–1999 equation image12.50 3.9825.47* 7.8515.79 
1979–2009 equation image76.66** 27.92*118.79** 55.16**93.12** 
Panel method 1979–2009        
Trend ( °C per decade)0.110**0.120**0.0790.159**0.105**  0.272**
Standard error0.0500.0590.0600.0580.047  0.013
p0.0270.0420.1860.0060.026  0.000
VF method        
1979–1999 equation image5.06 1.5519.36 0.2710.08 
1979–2009 equation image23.77* 6.2162.96** 0.2641.43* 
Panel method 1979–2009        
Trend ( °C per decade)0.0570.0790.0410.117**0.043  0.253**
Standard error0.0510.0570.0560.0570.049  0.012
p0.2720.1660.4660.0390.389  0.000

Trend comparison results are listed in Table III. The second column (‘Obs’) shows that at both the LT and MT layers, on data ending in 1999, the difference between models and observations is only marginally significant, echoing the findings of Santer et al. (2008). However, with the addition of another decade of data the results change, such that the differences between models and observations now exceed the 99% critical value. As shown in Table I and Section 3.3, the model trends are about twice as large as observations in the LT layer, and about four times as large in the MT layer.

Table III. Trend difference tests using nonparametric covariance estimator on balanced panels and panel regression on unbalanced panels
 Tests of difference from models  
  • VF group results: Vogelsang and Franses (2005) F2 test scores, 90% critical value is 20.14, 95% critical value is 41.53, and 99% critical value is 83.96. Panel (p) refers to panel regression results. Shown are the p values of a test of whether indicated trend difference = 0. See text for discussion of column groupings. For description of headings, see footnote of Table II

  • *

    Significant at 10% level.

  • **

    Significant at 5% level.

VF method       
1979–199924.96*    1990.10**4.51
1979–2009188.55**    399.85**2.06
Panel (p) 1979–20090.002**0.012**0.002**0.059*0.001**0.000**0.880
VF method       
1979–199935.48*    1203.37**10.18
1979–2009257.67**    229.35**13.91
Panel (p) 1979–20090.000**0.003**0.000**0.019**0.000**0.000**0.243

At both the LT and MT layers, on data ending in either 1999 or 2009, the VF05 tests show that the balloon data are not significantly different from the MSU data, but within the satellite category, the RSS and UAH data are significantly different. Possible reasons for RSS/UAH differences include treatment of intersatellite calibration, orbital decay, and other processing issues (Santer et al., 2005; Karl et al., 2006; Christy and Norris, 2009).

3.3. Panel regressions tests

In cases where one or more series is not of full length, the VF05 test will not work. The panel-corrected standard error estimator in the STATA program (command xtpcse) allows an unbalanced panel in the estimate of Equation (14); however, it imposes an AR1 assumption. For comparison purposes, we report these results on data ending in 2009. We again weighted each observation by the number of runs in the ensemble mean. None of the conclusions depend on this step.

In Table II, the panel estimator at the LT layer shows that the observations as a group (column 2) exhibit a significant trend of 0.110 °C per decade, compared to a model trend (column 9) of 0.272 °C per decade. The balloon and MSU series are each jointly significant (p = 0.026 and 0.042, respectively). In the MT layer, the model trend (0.253 °C per decade) remains significant. The mean observed trend is only 0.057 °C per decade. The panel-estimated standard error implies that it is insignificant (p = 0.272), while the VF05 score implies significance at 10%. Among observational series only RSS is individually significant, echoing the VF05 results. The MSU and balloon series are each jointly insignificant. Figures 2 and 3 show the trend magnitudes.

Figure 2.

Modeled and estimated trends (1979–2009, deg C per decade) in the tropics, LT layer. 95% confidence interval shown

Figure 3.

Modeled and estimated trends (1979–2009, deg C per decade) in the tropics, MT layer. 95% confidence interval shown

In Table III, the p values of the test scores on a hypothesis of equality between the indicated trends are shown in the bottom row. On data ending in 2009, the trend differences between models and observations (column 2) are significant in both the LT (p = 0.002) and MT (p = 0.000) layers, as was the case with the VF05 tests. The model-observation difference is significant for all data products at both layers, except for the RSS series in the LT layer (p = 0.059).

In the last columns of Table III, we test the differences among the observational series. As was the case with the VF05 tests, the balloons and MSU series are not significantly different from each other (p = 0.880), but within the MSU category, the RSS and UAH series are significantly different (p = 0.000).

4. Discussion and conclusions

Econometric tools are increasingly being used for climate data sets (Fomby and Vogelsang, 2002; Mills, 2010). We present two econometric methods for trend comparisons between data sets. Both add flexibility for multivariate comparisons and provide improved treatment of complex error structures. The multivariate testing method of Vogelsang and Franses (2005) yields more robust estimator of the covariance matrix, but requires balanced data panels. Panel regression methods can accommodate comparisons of series of unequal lengths, but software limitations typically limit treatment of within-panel autocorrelation to the AR1 case. In our example, the two methods yielded similar conclusions, indicating that the AR1 approximation in the panel model was likely not overly restrictive. In general, however, for the purpose of multivariate trend comparisons in climatology, we particularly recommend that the VF05 method enter the empirical toolkit.

In our example on temperatures in the tropical troposphere, on data ending in 1999, we find the trend differences between models and observations are only marginally significant, partially confirming the view of Santer et al. (2008) against Douglass et al. (2007). The observed temperature trends themselves are statistically insignificant. Over the 1979–2009 interval, in the LT layer, observed trends are jointly significant and three of four data sets have individually significant trends. In the MT layer, two of four data sets have individually significant trends and the trends are jointly insignificant or marginal depending on the test used. Over the interval 1979–2009, model-projected temperature trends are two to four times larger than observed trends in both the LT and MT and the differences are statistically significant at the 99% level.

Our methods assume that the trends are linear. We found no evidence for nonlinearity on the observed data, but some on modeled data in the MT. In addition, the fact that the results are sensitive to the end date suggests that they might also be sensitive to the start date. Since the satellite data are unavailable prior to 1979, we cannot extend these series earlier. Interpretation of trend comparisons should, therefore, make reference to the time period analyzed, which, ideally, should have some intrinsic interest. In this case, the 1979–2009 interval is a 31-year span during which the upward trend in surface data strongly suggests a climate-scale warming process. As noted in the studies cited in Section 1, comparing models to observations in the tropical troposphere is an important aspect of testing explanations of the origins of surface warming.