#### 2.1. Introduction: two-equation case

We assume that the data are stationary, though autocorrelated, upon detrending; in other words, ‘trend stationary.’ Suppose that there are two series of interest, *y*_{1τ} and *y*_{2τ}, where τ = 1, …, *T*. Trends are fitted using

- (1)

and

- (2)

A student's *t* test of slope equivalence is

- (3)

where ∧ denotes an OLS estimate, (*i* = 1,2) denotes an autocorrelation-robust variance estimator for *b̂*_{i}, and cov(*b̂*_{1}, *b̂*_{2}) is the estimated covariance between the trend terms.

Karl *et al.* (2006) drew attention to an apparent discrepancy between observed and model-generated temperature trends in the tropical atmosphere. Douglass *et al.* (2007) tested surface-matched differences (Supporting Information) using

- (4)

where *b̂*_{1} denotes the trend through model ensemble means, *b̂*_{2} denotes the trend through observations, and *s̃*_{1} is the estimated standard error of *b̂*_{1}. The test (4) incorrectly treats the observations as deterministic and assumes the model observations are independent across time. Santer *et al.* (2008) instead used

- (5)

where ∼ denotes a least-squares estimate and *r*_{i} denotes the first-order autoregressive (AR1) coefficient in series *i*. The ratio of AR1 terms is commonly referred to as an ‘effective degrees of freedom’ adjustment (Santer *et al.*2000). Instead of a series providing *T*-independent observations, it is said to provide only (1 − *r*_{i})*T*/(1 + *r*_{i}) -independent observations. The resulting variance corresponds to an estimate obtained using an AR1 model, but is not equivalent to that derived from higher order autocorrelation models. In addition, it does not yield a correct 2cov(*b̂*_{1}, *b̂*_{2}) term (Supporting Information), which was missing in both Equations (4) and (5). While detrended climate model projections may be uncorrelated with observations, the assumption of no covariance among trend coefficients implies that models have no low-frequency correspondence with observations in response to observed forcings, which seems overly pessimistic.

#### 2.2. Panel regressions

Equation (3) can be obtained using a panel regression. Suppose that the dependent variable is the stacked vector (*y*_{1}, *y*_{2})′, and we estimate the following equation:

- (6)

(1 1)′ denotes two stacked *T*-length vectors of ones. (0 1)′ denotes a vector of *T* zeros stacked on *T* ones. This is called an indicator or a ‘dummy variable,’ since it indicates (value = 1) if the dependent variable is *y*_{2}. (ττ)′ denotes a 2*T*-length vector consisting of two *T*-length time trends and (0 τ)′ is (ττ)′ times (0 1)′. A test of *d̂*_{2} = 0 in Equation (6) can be shown to be equivalent to testing *b̂*_{1} = *b̂*_{2} (Kmenta 1986; Supporting Information). Hence, the *t*-statistic on *d̂*_{2} in Equation (6) yields the test score (3).

To generalize the framework further, suppose that we are comparing *m* model-generated series and *o* observational series, making the total number of series *N* = *m* + *o*. Each source *i* yields *T*_{i}≤*T* nonmissing observations *y*_{iτ} over the interval τ = 1, …, *T*. Define an indicator variable obs_{iτ} = 0 if the record is model generated, and = 1 if it is from an observational series. Denote the *i*th vector as *y*′_{i} = [*y*_{i1}, …, *y*_{iT}]. Stack these vectors into a single *NT* × 1 vector **y** as follows:

- (7)

Stack the trend vector τ′ = [1, …, *T*]*N* times to get the *NT* × 1 panel trend vector

- (8)

The indicator, or the dummy, variables are likewise stacked to form

- (9)

where obs_{i} is (obs_{i1}, …,obs_{iT})′. The regression equation is then written as

- (10)

where **e** is an *NT* × 1 residual vector with typical element *e*_{iτ}. Note that all the ‘data’ are on the left-hand side, and the right-hand side consists of dummy variables and trend terms.

When obs_{ij} = 0, d*y*_{iτ}/dτ = *b̂*_{1} and when obs_{it} = 1, d*y*_{iτ}/dτ yields (*b̂*_{1} + *b̂*_{2}). Thus, a *t*-statistic on *b̂*_{1} will test whether the model trend is zero and a test of the linear restriction *b̂*_{1} + *b̂*_{2} = 0 indicates the significance of the observed slope. The *t*-statistic on *b̂*_{2} tests whether the trend on observations differs significantly from the trend in models.

Equation (10) can be extended further. Suppose that observations come from two different systems, such as satellites and weather balloons. Define two different indicator variables: **d**_{1}, which is equal to 1 if an observation is from either system 1 or 2, and **d**_{2} that is equal to 1 only if the observation is from system 2. The regression equation becomes

- (11)

The estimated model trend is *b̂*_{1}. The trend in observations from system 1 is *b̂*_{1} + *b̂*_{2} and from system 2 is *b̂*_{1} + *b̂*_{2} + *b̂*_{4}. The *t*-statistic on *b̂*_{4} tests whether the trend in the second observation system differs from that in the first, and so forth.

Hypothesis testing requires a valid estimator of *V*(**b**), the covariance matrix of **b**. The general form is (Davidson and MacKinnon 2002)

- (12)

where **X** denotes the right-hand side variables in Equation (11) and Ω = *E*(**ee**′). Obtaining a valid estimate of Ω requires modeling the cross- and within-panel covariances. For a panel *i* with *T* observations, define a matrix **A**_{i} of AR weights using the panel-specific AR1 coefficient ρ_{i}:

- (13)

#### 2.3. Higher order autocorrelations and multivariate trend models

Vogelsang and Franses (2005, herein VF05) derived two estimators for Ω that impose no parametric restrictions on the lag and correlation structure, as is done in Equation (14). Suppose that the *N* panels are used one at a time in Equation (1), yielding OLS trend estimates *b̂* = *b̂*_{1}, …, *b̂*_{N}. Take the *N* residual series *u*_{1τ}, …, *u*_{Nτ} and form the *T* × *N* matrix **U** = [*u*_{1τ}, …, *u*_{Nτ}]. VF05 derive two transformations of **U** that converge in probability to a scalar multiple of Ω. Of their two estimators, we focus on the form, which has higher power and is slightly easier to compute. It is obtained as follows. Denote **V** = **U**′ and take the columns *v*_{j}, for *j* = 1, …, *T*, each of length *N*. Define a vector . Then, VF05 show that

- (15)

converges in probability to an unbiased estimate of Ω, regardless of the form of autocorrelation and other departures from the independence assumption. For testing purposes, linear restrictions on the slopes can be written in the matrix form **R***b̂* = **0** (Supporting Information). The VF05 test statistic is

- (16)

where η = Σ(*t* − *t̄*)^{2} and *q* is the number of restrictions, which in our examples is always equal to 1. Critical values for Equation (16) generated by Monte Carlo simulation are reported in VF05.

The VF05 approach improves on the panel method by providing robust trend variances and covariances regardless of the autocorrelation order and the structure of heteroskedasticity. However, it requires balanced panels, which can be a limitation in some cases.

The VF05 statistic, as with all test statistics, has improved size as the sample size increases. Rejection probabilities also increase as ρ1. Monte Carlo simulations in VF05 show that for *T* = 100, when *q* = 1 and ρ> 0.8, just under 10% of scores exceed the 95th percentile, indicating a tendency to over-reject a true null, although this is an improvement compared to earlier alternatives. Each panel in our full sample has well over 100 observations, but a high ρ value. Hence, VF05 scores that are close to the critical values may overstate significance.